PATENT APPLICATION 
RECOMBINANT CHALCOMYCIN POLYKETIDE SYNTHASE AND MODIFYING GENES 

STATEMENT OF GOVERNMENT INTEREST 
[0001] Subject matter disclosed in this application was made, in part, with government 
support under NIH Grant No. R43 CA AI50305. As such, the United States government may 
have certain rights in this invention. 

CROSS-REFERENCE TO RELATED APPLICATIONS 
[0002] The present application claims benefit of U.S. provisional patent application nos. 
60/405,194 (filed 21 August 2002); 60/420,994 (filed 24 October 2002); and 60/493,966 (filed 8 
August 2003) each of which is incorporated herein by reference in its entirety. 

FIELD OF THE INVENTION 
[0003] The invention relates to recombinant polynucleotides that encode polypeptides or 
domains of the chalcomycin polyketide synthase gene cluster. Accordingly, the present 
invention is directed to the production of chalcomycin PKS enzymes, to polynucleotides that 
encode such enzymes, and to host cells that contain such polynucleotides. Further enhancements 
in the biological activities of chalcomycin and other polyketides, through production of 
derivatives thereof, is also made possible according to the practice of the invention by providing 
P450 hydroxylases that provide attachment points on the polyketide molecule for further 
modifications. Thus the present invention relates to the fields of molecular biology, chemistry, 
recombinant DNA technology, medicine, animal health, and agriculture. 

BACKGROUND OF THE INVENTION 
[0004] Polyketides represent a large family of diverse compounds synthesized from 2 carbon 
units through a series of condensations and subsequent modifications. Polyketides occur in 
many types of organisms including fungi and mycelial bacteria, in particular the actinomycetes. 
An appreciation for the wide variety of polyketide structures and for their biological activities, 
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may be gained upon review of the extensive art, for example, published PCT Patent Publication 
WO 95/08548 and United States Patent Nos. 5,672,491 and 6,303,342 

[0005] Polyketides are synthesized in nature by polyketide synthases ("PKS"). The Type I or 
modular PKS comprise a set of separate catalytic active sites; each active site is termed a 
"domain", and a set thereof is termed a "module". One module exists for each cycle of carbon 
chain elongation and modification. Figure 9 of aforementioned WO95/08548 depicts a typical 
Type I PKS, in this case 6-deoxyerythronoIide B synthase ("DEBS"), which is involved in the 
production of erythromycin. Six separate modules, each catalyzing a round of condensation and 
modification of a 2-carbon unit, are present in DEBS. The number and type of catalytic domains 
that are present in each module varies based on the needed chemistry, and the total of 6 modules 
is provided on 3 separate polypeptides (designated DEBS-1, DEBS-2, and DEBS-3, with 2 
modules per each polypeptide). Each of the DEBS polypeptides is encoded by a separate open 
reading frame (gene), see Caffrey et al, FEBS Letters, 304, pp. 205, 1992. DEBS provides a 
representative example of a Type I PKS. In DEBS, modules 1 and 2 reside on DEBS-1, modules 
3 and 4 on DEBS-2, and modules 5 and 6 on DEBS-3, wherein module 1 is defined as the first 
module to act on the growing polyketide backbone, and module 6 the last. 
[0006] The minimal PKS module is typified by module 3 of DEBS which contains a 
ketosynthase ("KS") domain, an acyltransferase ("AT") domain, and an acyl carrier protein 
("ACP") domain. These three enzyme activities are sufficient to activate a 2-carbon extender 
unit and attach it to the growing polyketide molecule. Additional domains that may be included 
in a module relate to reactions other than the actual condensation, and include domains for a 
ketoreductase activity ("KR"), a dehydratase activity ("DH"), and an enoylreductase activity 
("ER"). With respect to DEBS-1, the first module thereof also contains an additional set of the 
AT and ACP activities because that module catalyzes initial condensation, and so begins with a 
"loading domain'* (sometimes referred to as a loading module) that contains an AT and ACP, 
that bind the starter unit. The "finishing" of the 6-deoxyerythronolide molecule is regulated by a 
thioesterase activity ("TE") in module 6 that catalyzes cyclization of the macrolide ring during 
release of the product of the PKS. 

[0007] PKS genes can be engineered in a variety of ways to achieve biosynthesis of 
polyketides. For instance, PKS genes can be inserted into a heterologous host to make a 
polyketide in a host that does not make it naturally. Polyketides can also be made by hybrid or 
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otherwise altered PKSs or polyketide biosynthetic gene clusters. Also, polyketides can be 
overproduced by increasing the pools of available starting polyketide biosynthetic precursors and 
by other means. See U.S. Pat. Nos. 5,672,491; 5,962,290; 6,080,555; 6,391,594; and 6,221,641 
and PCT Patent Publications 00/47724, 0i/27306, and 01/31035. 

[0008] Chalcomycin is a 16-membered macrolide antibiotic produced by some strains of 
Streptomyces bikiniensis and possesses a broad spectrum of antimicrobial activity. Certain 
naturally occurring derivatives of chalcomycin produced by other Streptomyces organisms also 
have antimicrobial activity. For instance, the 8-deoxy chalcomycin derivative produced by 
Streptomycin hirsutus has antimicrobial activity against gram-positive bacteria. Chalcomycin 
has two modifying sugar molecules, D-mycinose and D-chalcose, the former being subject to 
post-glycosylation modification by O-methylation at two positions. For additional information 
regarding chalcomycin, see Woo, P.W.K. et aL, J.A.C.S, 1962, 84, 1512; 1964, 86, 2724; 2726; 
Celmer, W.D., J.A.C.S, 1965, 87, 1801; Omura, S. et aL, J.A.CS, 1975, 97, 4001; Neszmelyi, 
A. et aL, Chem. Comm., 1976, 97; Jardim, M.E. et aL, Int. J. Mass Spectrom. Ion Phys., 1983, 
48, 189; Hauske, J.R. et aL, J.O.C, 1986, 51, 2808; Kim, S.D. et aL, J. Antibiot., 1996, 49, 955; 
Woo, P.W.K. et aL, Tetrahedron, 1996, 52, 3857 and Goo, Y.M. et aL, 1 Antibiot., 1997, 50, 85. 
[0009] The chemical structure of chalcomycin, shown without stereochemistry, is provided 
by formula I below. 
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Formula I 



[0010] Chalcomycin is synthesized by a Type I or modular PKS and modification enzymes. 
Post-PKS modification reactions include P450 oxidation at three sites to add hydroxyl groups 
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and glycosylation at the C5 hydroxyl to add D-chalcose, and at the C20 hydroxyl to add allose, 
which is then methylated at two positions to yield D-mycinose. 

[0011] There is a need for recombinant nucleic acids, host cells, and methods of using those 
host celis to produce poiykelides including but not limited to chalcomycin and chalcomycin 
analogs. These and other needs are met by the materials and methods provided by the present 
invention. 

SUMMARY OF THE INVENTION 
[0012] The present invention provides recombinant nucleic acids encoding polyketide 
synthases and polyketide modification enzymes. The recombinant nucleic acids of the invention 
are useful in the production of polyketides, including but not limited to chalcomycin and 
chalcomycin analogs and derivatives in recombinant host cells. The biosynthesis of chalcomycin 
is performed by a modular PKS and polyketide modification enzymes. The chalcomycin 
synthase is made up of several proteins, each having one or more modules. The modules have 
domains with specific synthetic functions. 

[0013] The present invention also provides domains and modules of the chalcomycin PKS 
and corresponding nucleic acid sequences encoding them and/or parts thereof. Such compounds 
are useful in the production of hybrid PKS enzymes and the recombinant genes that encode 
them. 

[0014] The present invention also provides modifying genes of chalcomycin biosynthetic 
gene cluster in recombinant form, including but not limited to isolated form and incorporated 
into a vector or the chromosomal DNA of a host cell. The present invention also provides 
recombinant P450 hydroxylases that provide hydroxyl attachment points useful for further 
chemical modification. The P450 hydroxylases of the present invention include ChmHI, ChmPI 
and ChmPII hydroxylases. 

[0015] The present invention also provides recombinant host cells that contain the nucleic 
acids of the invention. In one embodiment, the host cell provided by the invention is a 
Streptomyces host cell that produces a chalcomycin modification enzyme and/or a domain, 
module, or protein of the chalcomycin PKS. Methods for the genetic manipulation of 
Streptomyces are described in Kieser et al 9 "Practical Streptomyces Genetics," The John Innes 
Foundation, Norwich (2000), which is incorporated herein by reference in its entirety. 
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[0016] Accordingly, there is provided a recombinant PKS wherein at least 10, 15, 20, or 
more consecutive amino acids in one or more domains of one or more modules thereof are 
derived from one or more domains of one or more modules of chalcomycin polyketide synthase. 
Preferably at ieast an emire domain of a module of chalcomycin synthase is included, 
Representative chalcomycin PKS domains useful in this aspect of the invention include, for 
example, KR, DH, ER, AT, ACP and KS domains. In one embodiment of the invention, the 
PKS is assembled from polypeptides encoded by DNA molecules that comprise coding 
sequences for PKS domains, wherein at least one encoded domain corresponds to a domain of 
chalcomycin PKS. In such DNA molecules, the coding sequences are operably linked to control 
sequences so that expression therefrom in host cells is effective. In this manner, chalcomycin 
PKS coding sequences or modules and/or domains can be made to encode PKS to biosynthesize 
compounds having antibiotic or other useful bioactivity other than chalcomycin. 
[0017] These and other aspects of the present invention are described in more detail in the 
Detailed Description of the Invention, below. 

BRIEF DESCRIPTION OF THE DRAWINGS 
[0018] Figure 1 illustrates the structure of the chalcomycin PKS biosynthetic gene cluster, 
and cosmids pKOS 146. 185.1 and pKOS146.185.10, which contain insert DNA encompassing 
the chalcomycin PKS gene cluster and associated modification enzyme genes. Abbreviations: 
ACP, acyl carrier protein; chm, chalcomycin gene; Orf, open reading frame. 
[0019] Figure 2 shows proposed pathways for post-PKS modification of the chalcomycin- 
spiramycin hybrid PKS macrolide product. 

DETAILED DESCRIPTION OF THE INVENTION 
[0020] The invention provides recombinant materials for the production of polyketides. In 
an aspect, the present invention provides recombinant nucleic acids encoding polyketide 
synthases that contain all or a portion of the chalcomycin PKS. The biosynthesis of chalcomycin 
is performed by a modular PKS and modification enzymes. The chalcomycin synthase is made 
up of five proteins, each having one or more modules, each module comprising domains with 
specific synthetic functions. Thus, the present invention also provides the domains and modules 
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of the chalcomycin PKS and corresponding nucleic acid sequences encoding them in 
recombinant form, 

[0021] Modifying genes of the chalcomycin biosynthetic gene cluster are also provided, 
including but not limited to the genes for the ChmHI, ChmPI and ChmPII P450 hydroxylases 
that provide hydroxyl attachment points useful for further chemical modification. 
[0022] Methods and host cells for using these genes to produce or modify a polyketide in 
recombinant host cells are also provided. 

[0023] The nucleotide sequences encoding chalcomycin PKS and modifying proteins of the 
present invention were isolated from Streptomyces bikiniensis NRRL 2737 (obtained from the 
Agricultural Research Service Culture Collection, National Center for Agricultural Utilization 
Research, Peoria, Illinois USA). The chalcomycin PKS gene cluster and modifying genes are 
contained in cosmids pKOS 146.185.1 and pKOS146.185.10. The cloning and characterization 
of the chalcomycin PKS gene cluster is described in Example 1, infra. pKOS146-185.1 was 
deposited under the terms of the Budapest Treaty with the American Type Culture Collection, 
10801 University Blvd., Manassas, VA, 20110-2209, on 19 February 2003, with accession 
number PTA-4961. pKOS146-185.10 was deposited under the terms of the Budapest Treaty 
with the American Type Culture Collection, 10801 University Blvd., Manassas, VA, 201 10- 
2209, on 19 February 2003, with accession number PTA-4962. 

[0024] Given the valuable properties of chalcomycin and its modifying enzymes, means to 
produce useful quantities thereof and derivatives or analogs of chalcomycin are valuable. 
Further, the chalcomycin modifying enzymes can also be used to modify other polyketides and 
produce derivatives thereof with enhanced solubility and/or bioactivity, for instance as 
antibiotics, and/or sites for further enzymatic or chemical modification. The nucleotide 
sequences of the chalcomycin biosynthetic gene cluster encoding chalcomycin PKS and 
modifying enzymes, and domains and/or modules of the PKS can be used, for example, to make 
antibiotics or other useful compounds in addition to chalcomycin, and in host cells in addition to 
Streptomyces bikiniensis. 

[0025] There is a need for recombinant nucleic acids, host cells, and methods of expressing 
those nucleic acids in host cells resulting in production of chalcomycin and or its analogs or 
derivatives, and modifying enzymes, such as the cytochrome P450 hydroxylases that specifically 
attach hydroxyl groups on the resulting aglycone (which can then be used as attachment points 
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for further modifications). The modifying P450's from the chalcomycin PKS cluster of the 
present invention can be used to make compounds in a host that does not naturally produce such 
compounds. These and other needs are met by the materials and methods of the present invention 
[0026J In one aspeci of ihc invention, purified and isolated DNA molecules are provided that 
comprise one or more coding sequences for one or more domains or modules of chalcomycin 
synthase. Examples of such encoded domains include chalcomycin synthase KR, DH, ER, AT, 
ACP, and KS domains. In one aspect, the invention provides DNA molecules in which the 
complete set of chalcomycin PKS-encoding sequences are operably linked to expression control 
sequences that are effective in suitable host cells to produce chalcomycin and/or its analogs or 
derivatives. In one aspect, the invention provides polypeptides comprising a portion of the 
coding sequences for the proteins of the chalcomycin synthase. 

[0027] Table 2 in Example 1 provides a description of genes in the chalcomycin PKS gene 
(i.e., SEQ ID NO:l and subsequences encoding modules, domains and ORFs, e.g., as indicated), 
as well as encoded proteins (including SEQ ID. NOS: 2-43) or domains. It will be apparent 
from Table 2, and Figures 1 and 2, which DNA strand comprises the coding sequence for a 
protein (i.e., the strand having the sequence of SEQ ID NO:l, or its complement. 
[0028] In one aspect, the invention provides an isolated or recombinant DNA molecule 
comprising a nucleotide sequence that encodes at least one polypeptide, alternatively at least one 
module, alternatively at least one domain, involved in the biosynthesis of a chalcomycin. In one 
aspect, the invention provides the present invention provides an isolated or recombinant DNA 
molecule comprising a nucleotide sequence that encodes at least one polypeptide, alternatively at 
least one module, alternatively at least one domain, involved in the biosynthesis of a 
chalcomycin. The invention also provides polypeptides comprising PKS interpolypeptide linker 
sequences, and polynucleotides encoding such linker sequences. Also provided by the invention 
are polypeptides comprising intrapolypeptide linker sequences, and polynucleotides encoding 
such linkers. 

[0029] In one aspect, the invention provides an isolated or recombinant DNA molecule 
comprising a sequence identical or substantially similar to at least one subsequence of SEQ ID 
NO:l or its complement. In an embodiment the subsequence comprises a sequence encoding a 
chalcomycin PKS domain or module. In an aspect, the invention provides a recombinant DNA 
molecule that encodes a polypeptide, module or domain derived from a chalcomycin polyketide 
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synthase (PKS) gene cluster. In this context, a polypeptide, module or domain is derived from a 
chalcomycin polyketide synthase (PKS) gene cluster when it is encoded by a DNA with 
substantial sequence identity to the corresponding coding region of the S. bikiniensis 
chalcomycin gene cluster. For example, in an embodiment, the DNA encoding sequence of the 
polypeptide, module or domain hybridizes under stringent conditions to a nucleic acid having the 
sequence of SEQ ID NO:l (or its complement). Generally, such a polypeptide, module or 
domain is biologically active, i.e., has at least one enzymatic activity chracteristic of the 
polypeptide, module or domain encoded exactly by corresponding sequence of SEQ ID NO: 1 or 
its complement. The biological activity of a polypeptide of the invention can be measured by 
methods well known to the art. 

[0030J In one aspect, the invention provides the present invention provides an isolated or 
recombinant DNA molecule comprising a nucleotide sequence that encodes an open reading 
frame, module or domain having an amino acid sequence identical or substantially similar to an 
ORF, module or domain encoded by SEQ ID NO:l or its complement. Generally, a polypeptide, 
module or domain having a sequence substantially similar to a reference sequence has 
substantially the same activity as the reference protein, module or domain (e.g., when integrated 
into an appropriate PKS framework using methods known in the art). In certain embodiments, 
one or more activities of a substantially similar polypeptide, module or domain are modified or 
inactivated as described below. 

[0031] In one aspect, the invention provides an isolated or recombinant DNA molecule 
comprising a nucleotide sequence that encodes at least one polypeptide, module or domain 
encoded by SEQ ID NO:l, e.g., a polypeptide, module or domain involved in the biosynthesis of 
a chalcomycin, wherein said nucleotide sequence comprises at least 20, 25, 30, 35, 40, 45, or 50 
contiguous base pairs identical to a sequence of SEQ ID NO: 1 or its complement. In one aspect, 
the invention provides an isolated or recombinant DNA molecule comprising a nucleotide 
sequence that encodes at least one polypeptide, module or domain encoded by SEQ ID NO:l, 
e.g., a polypeptide, module or domain involved in the biosynthesis of a chalcomycin, wherein 
said polypeptide, module or domain comprises at least 10, 15, 20, 30, or 40 contiguous residues 
of a corresponding polypeptide, module or domain encoded by SEQ ID NO:l or its complement. 
[0032] In a related aspect, the invention provides a recombinant DNA molecule, comprising 
a sequence of at least about 200, optionally at least about 500, basepairs with a sequence 



8 



identical or substantially identical to a protein encoding region of SEQ ID NO:l. In an 
embodiment, the DNA molecule encodes a polypeptide, module or domain derived from a 
chalcomycin polyketide synthase (PKS) gene cluster. 

[0033] It will be understood that SEQ ED NO: 1 was determined using the inserts of pKOS 
146.185.1 and pKOS146-185.10. Accordingly, the invention provides an isolated or 
recombinant DNA molecule comprising a sequence identical or substantially similar to a ORF 
encoding sequence of the insert of pKOS 146.185.1 or pKOS146-185.10. 
[0034] Those of skill will recognize that, due to the degeneracy of the genetic code, a large 
number of DNA sequences encode the amino acid sequences of the domains, modules, and 
proteins of the chalcomycin PKS, the enzymes involved in chalcomycin modification and other 
polypeptides encoded by the genes of the chalcomycin biosynthetic gene cluster. The present 
invention contemplates all such DNAs. For example, it may be advantageous to optimize 
sequence to account for the codon preference of a host organism. The invention also 
contemplates naturally occurring genes encoding the chalcomycin PKS and tailoring enzymes 
that are polymorphic or other variants. In addition, it will be appreciated that polypeptide, 
modules and domains of the invention may comprise one or more conservative amino acid 
substitutions relative to the polypeptides encoded by SEQ ID NO: 1, such as, for example, 
conservative substitutions include aspartic-glutamic as acidic amino acids; 
lysine/arginine/histidine as basic amino acids; leucine/isoleucine, methionine/valine, 
alanine/valine as hydrophobic amino acids; serine/glycine/alanine/threonine as hydrophilic 
amino acids. 

[0035] As used herein, the terms "substantial identity," "substantial sequence identity," or 
"substantial similarity" in the context of nucleic acids, refers to a measure of sequence similarity 
between two polynucleotides. Substantial sequence identity can be determined by hybridization 
under stringent conditions, by direct comparison, or other means. For example, two 
polynucleotides can be identified as having substantial sequence identity if they are capable of 
specifically hybridizing to each other under stringent hybridization conditions. Other degrees of 
sequence identity (e.g., less than "substantial") can be characterized by hybridization under 
different conditions of stringency. "Stringent hybridization conditions" refers to conditions in a 
range from about 5°C to about 20°C or 25°C below the melting temperature (Tm) of the target 
sequence and a probe with exact or nearly exact complementarity to the target. As used herein, 
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the melting temperature is the temperature at which a population of double-stranded nucleic acid 
molecules becomes half-dissociated into single strands. Methods for calculating the Tm of 
nucleic acids are well known in the art (see, e.g., Berger and Kimmel, 1987, Methods In 
Enzymology, Vol. 152: Guide To Molecular Cloning Techniques, San Diego: Academic Press, 
Inc. and Sambrook et al., 1989, Molecular Cloning: A Laboratory Manual, 2nd Ed., Vols. 1-3, 
Cold Spring Harbor Laboratory). Typically, stringent hybridization conditions for probes greater 
than 50 nucleotides are salt concentrations less than about 1.0 M sodium ion, typically about 0.01 
to 1 .0 M sodium ion at pH 7.0 to 8.3, and temperatures at least about 50°C, preferably at least 
about 60°C. As noted, stringent conditions may also be achieved with the addition of 
destabilizing agents such as formamide, in which case lower temperatures may be employed. 
Exemplary conditions include hybridization at 7% sodium dodecyl sulfate (SDS), 0.5 M NaPC>4 
pH 7.0, 1 mM EDTA at 50°C (or alternatively 65°C); wash with 2*SSC, 1% SDS, at 50°C (or 
alternatively 0.1 - 0.2 xSSC, 1% SDS, at 50°C or 65°C). Other exemplary conditions for 
hybridization include (1) high stringency: O.lxSSPE, 0.1% SDS, 65°C; (2) medium stringency: 
0.2xSSPE, 0.1% SDS, 50° C; and (3) low stringency: l.OxSSPE, 0.1% SDS, 50° C. Equivalent 
stringencies may be achieved using alternative buffers, salts and temperatures. 
[0036] Alternatively, substantial sequence identity can be described as a percentage 

identity between two nucleotide or amino acid sequences. Two nucleic acid sequences are 
considered substantially identical when they are at least about 70% identical, at least about 75% 
identical, or at least about 80% identical, or at least about 85% identical, or at least about 90% 
identical, or at least about 95% or 98% identical. Two amino acid sequences are considered 
substantially identical when they are at least about 60%, sequence identical, more often at least 
about 70%, at least about 80%, or at least about 90% sequence identity to the reference sequence. 
Percentage sequence (nucleotide or amino acid) identity is typically calculated using art known 
means to determine the optimal alignment between two sequences and comparing the two 
sequences. Optimal alignment of sequences may be conducted using the local homology 
algorithm of Smith and Waterman (1981) Adv. Appl. Math. 2: 482, by the homology alignment 
algorithm of Needleman and Wunsch (1970) J. Mol Biol 48: 443, by the search for similarity 
method of Pearson and Lipman (1988) Proc. Natl. Acad. Sci. U.S.A. 85: 2444, by the BLAST 
algorithm of Altschul (1990) J. Mol. Biol. 215: 403-410; and Shpaer (1996) Genomics 
38:179-191, or by the Needleham et al. (1970) J. Mol Biol 48: 443-453; and Sankoff et al., 
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1983, Time Warps, String Edits, and Macromolecules, The Theory and Practice of Sequence 
Comparison, Chapter One, Addison- Wesley, Reading, MA; generally by computerized 
implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin 
Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, WI: BLAST 
from the National Center for Biotechnology Information (http:// www.ncbi.nlm.nih.gov/). In 
each case default parameters are used (for example the BLAST program uses as defaults a 
wordlength (W) of 1 1, the BLOSUM62 scoring matrix (see Henikoff (1992) Proc. Natl Acad. 
Set USA 89: 10915-10919) alignments (B) of 50, expectation (E) of 10, M=5, N=-4, and a 
comparison of both strands). 

[0037] As used herein the term "recombinant" has its usual meaning in the art and refers to a 
polynucleotide synthesized or otherwise manipulated in vitro, or to methods of using 
recombinant polynucleotides to produce gene products in cells or other biological systems. 
Thus, a "recombinant" polynucleotide is defined either by its method of production or its 
structure. In reference to its method of production, the process is use of recombinant nucleic 
acid techniques, e.g., involving human intervention in the nucleotide sequence, typically 
selection or production. Alternatively, a recombinant polynucleotide can be a polynucleotide 
made by generating a sequence comprising fusion of two fragments which are not naturally 
contiguous to each other, but is meant to exclude products of nature. Thus, for example, 
products made by transforming cells with any non-naturally occurring vector is encompassed, as 
are polynucleotides comprising sequence derived using any synthetic oligonucleotide process, as 
are polynucleotides from which a region has been deleted. A recombinant polynucleotide can 
also be a coding sequence that has been modified in vivo using a recombinant oligo or 
polynucleotide (such as a PKS in which a domain is inactivated by homologous recombination 
using a recombinant polynucleotide). A "recombinant" polypeptide is one expressed from a 
recombinant polynucleotide. 

[0038] It will be immediately recognized by those of skill that recombinant polypeptides of 
the invention have a variety of uses, some of which are described in detail below, including but 
not limited to use as enzymes, or componants of enzymes, useful for the synthesis or 
modification of polyketides. Recombinant polypeptides encoded by the chalcomycin PKS gene 
cluster are also useful as antigens for production of antibodies. Such antibodies find use for 
purification of bacterial (e.g., Streptomyces bikiniensis) proteins, detection and typing of 
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bacteria, and particularly, as tools for strain improvement (e.g., to assay PKS protein levels to 
identify "up-regulated" strains in which levels of polyketide producing or modifying proteins are 
elevated) or assessment of efficiency of expression of recombinant proteins. Polyclonal and 
monoclonal antibodies can be made by well known and routine methods (see, e.g., Harlow and 
Lane, 1988, ANTIBODIES: A LABORATORY MANUAL, COLD SPRING HARBOR 
LABORATORY, New York; Koehler and Milstein 1075, Nature 256:495). In selecting 
polypeptide sequences for antibody induction, it is not to retain biological activity; however, the 
protein fragment must be immunogenic, and preferably antigenic (as can be determined by 
routine methods). Generally the protein fragment is produced by recombinant expression of a 
DNA comprising at least about 60, more often at least about 200, or even at least about 500 or 
more base pairs of protein coding sequence, such as a polypeptide, module or domain derived 
from a chalcomycin polyketide synthase (PKS) gene cluster. Methods for expression of 
recombinant proteins are well known. (See, e.g., Ausubel et al, 2002, Current Protocols In 
Molecular Biology, Greene Publishing and Wiley-Interscience, New York.) 
[0039] Further aspects of the invention include chimeric PKSs comprising a portion (in one 
embodiment at least a domain, optionally at least a module, or alternatively at least one 
polypeptide) from the chalcomycin PKS, and a portion (in one embodiment at least a domain, 
optionally at least a module, or alternatively at least a polypeptide) from one or more non- 
chalcomycin PKSs. For example, the invention provides (1) encoding DNA for a chimeric PKS 
that is substantially patterned on a non-chalcomycin producing enzyme, but which includes one 
or more functional domains or modules of chalcomycin PKS; (2) encoding DNA for a chimeric 
PKS that is substantially patterned on the chalcomycin PKS, but which includes one or more 
functional domains or modules of another PKS or NRPS; and (3) methods for making 
chalcomycin analogs and derivatives. With respect to item (1) above, examples include chimeric 
PKS enzymes wherein the genes for the erythromycin PKS, rapamycin PKS, tylosin PKS, and 
spiramycin PKS, or another PKS function as accepting genes, and one or more of the above- 
identified coding sequences for chalcomycin domains or modules are inserted as replacements 
for domains or modules of comparable function. With respect to item (2) above, examples 
include chimeric PKS enzymes wherein the chalcomycin PKS serves as an accepting gene, and 
genes for the erythromycin PKS, rapamycin PKS, tylosin PKS, and spiramycin PKS, or another 
PKS function as accepting genes, and one or more of the above-identified coding sequences for 
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chalcomycin domains or modules are inserted as replacements for domains or modules of 
comparable function. A partial list of sources of PKS sequences for use in making chimeric 
molecules, for illustration and not limitation, includes Avermectin (U.S. Pat. No. 5,252,474; 
MacNeil et al., 1993, Industrial Microorganisms: Basic and Applied Molecular Genetics, Baltz. 
Hegeman, & Skatrud, eds. (ASM), pp. 245-256; MacNeil et al., 1992, Gene 115: 1 19-25); 
Candicidin (FRO008) (Hu et al., 1994, Mol. Microbiol. 14: 163-72); Epothilone (U.S. Pat. No. 
6,303,342); Erythromycin (WO 93/13663; U.S. Pat. No. 5,824,513; Donadio et al., 1991, Science 
252:675-79; Cortes et al., 1990, Nature 348:176-8); FK-506 (Motamedi et al., 1998, Eur. J. 
Biochem. 256:528-34; Motamedi et al., 1997, Eur. J. Biochem. 244:74-80); FK-520 (U.S. Pat. 
No. 6,503,737; see also Nielsen et al, 1991, Biochem. 30:5789-96); Lovastatin (U.S. Pat. No. 
5,744,350); Nemadectin (MacNeil et al., 1993, supra); Niddamycin (Kakavas et al., 1997, J. 
Bacterioh 179:7515-22); Oleandomycin (Swan et al., 1994, Mol Gen. Genet. 242:358-62; U.S. 
Pat. No. 6,388,099; Olano et al, 1998, Mol. Gen. Genet. 259:299-308); Platenolide (EP Pat. 
App. 791,656 ); Rapamycin (Schwecke et al., 1995, Proc. Natl Acad. Sci. USA 92:7839-43); 
Aparicio et al., 1996, Gene 169:9-16); Rifamycin (August et al., 1998, Chemistry & Biology, 5: 
69-79); Soraphen (U.S. Pat. No. 5,716,849; Schupp et al., 1995, J. Bacteriology 177: 3673-79); 
Spiramycin (U.S. Pat. No. 5,098,837); Tylosin (EP 0 791,655; Kuhstoss et al., 1996, Gene 
183:231-36; U.S. Pat. No. 5,876,991). Additional suitable PKS coding sequences remain to be 
discovered and characterized, but will be available to those of skill (e.g., by reference to 
GenBank). 

[0040] Construction of such chimeric enzymes is most effectively achieved by construction 
of appropriate encoding polynucleotides. In preparing modified and chimeric proteins, it is not 
necessary, although it may be most efficient, to replace or substitute one or more entire domains 
or modules of one PKS (e.g., the chalcomycin PKS or another PKS) with an entire domain or 
module of a different PKS (e.g., the chalcomycin PKS or another PKS). Rather, peptide 
subsequences of a PKS domain or module that correspond to a peptide subsequence in an 
accepting domain or module, or which otherwise provide useful function, may be used as 
replacements. Accordingly, appropriate encoding DNAs for construction of such chimeric PKS 
include those that encode at least 10, 1 5, 20 or more amino acids of a selected chalcomycin 
domain or module. Recombinant methods for manipulating modular PKS genes to make 
chimeric PKS enzymes are described in U.S. Patent Nos. 5,672,491; 5,843,718; 5,830,750; and 
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5,712,146; and in PCT publication Nos. 98/49315 and 97/02358. A number of genetic 
engineering strategies have been used with DEBS to demonstrate that the structures of 
polyketides can be manipulated to produce novel natural products, primarily analogs of the 
erythromycins (see the patent publications referenced supra and Hutchinson, 1998, Curr Opin 
Microbiol 1:319-329, and Baltz, 1998, Trends Microbiol 6:76-83). 

[0041] The invention methods may be directed to the preparation of an individual polyketide. 
The polyketide may or may not be novel, but the method of preparation permits a more 
convenient or alternative method of preparing it. The resulting polyketides may be further 
modified to convert them to other useful compounds. Examples of chemical structures of 
sixteen-membered macro lides that can be made using the materials and methods of the present 
invention are described in PCT Patent Publication WO 02/32916; U.S. Patent Application 
US20020128213A (app. no. 09/969,177); and copending U.S. provisional patent application no. 
60/493,966. 

[0042] The recombinant DNAs and DNA vectors of the inventions can also be used to make 
"libraries" of polyketides. Generally, members of these polyketide libraries may themselves be 
novel compounds, and the invention further includes novel polyketide members of these 
libraries. Regardless of the naturally occurring PKS gene used as an acceptor, the invention 
provides libraries of polyketides by generating modifications in, or using a portion of, the 
chalcomycin PKS so that the protein complexes produced have altered activities in one or more 
respects, and thus produce polyketides other than the natural product of the PKS. Novel 
polyketides may thus be prepared, or polyketides in general prepared more readily, using this 
method. By providing a large number of different genes or gene clusters derived from a 
naturally occurring PKS gene cluster, each of which has been modified in a different way from 
the native cluster, an effectively combinatorial library of polyketides can be produced as a result 
of the multiple variations in these activities. 

[0043] As noted, in one aspect the invention provides recombinant PKS wherein at least 10, 
15, 20, or more consecutive amino acids in one or more domains of one or more modules thereof 
are derived from one or more domains of one or more modules of chalcomycin polyketide 
synthase. A polyketide synthase "derived from" a naturally occurring PKS contains the 
scaffolding encoded by all the portion employed of the naturally occurring synthase gene, 
contains at least two modules that are functional, and contains mutations, deletions, or 
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replacements of one or more of the activities of these functional modules so that the nature of the 
resulting polyketide is altered. This definition applies both at the protein and genetic levels. 
Particular embodiments include those wherein a KS, AT, KR, DH, or ER has been deleted or 
replaced by a version of the activity from a different PKS or from another location within the 
same PKS, and derivatives where at least one noncondensation cycle enzymatic activity (KR, 
DH, or ER) has been deleted or wherein any of these activities has been added or mutated so as 
to change the ultimate polyketide synthesized. 

[0044] There are at least five degrees of freedom for constructing a polyketide synthase in 
terms of the polyketide that will be produced. First, the polyketide chain length will be 
determined by the number of modules in the PKS. Second, the nature of the carbon skeleton of 
the PKS will be determined by the specificities of the acyl transferases which determine the 
nature of the extender units at each position — e.g., malonyl, methyl malonyl, methoxy malonyl, 
or ethyl malonyl, etc. Third, the loading domain specificity will also have an effect on the 
resulting carbon skeleton of the polyketide. Fourth, the oxidation state at various positions of the 
polyketide will be determined by the dehydratase and reductase portions of the modules. This 
will determine the presence and location of ketone, alcohol, alkene or alkane substituents at 
particular locations in the polyketide. Fifth, the stereochemistry of the resulting polyketide is a 
function of three aspects of the synthase. The first aspect is related to the AT/KS specificity 
associated with substituted malonyls as extender units, which affects stereochemistry only when 
the reductive cycle is missing or when it contains only a ketoreductase since the dehydratase 
would abolish chirality. Also, the specificity of the ketoreductase will determine the chirality of 
the corresponding hydroxyl group. Also, the enoyl reductase specificity for substituted malonyls 
as extender units will influence the result when there is a complete KR/DH/ER available. 
[0045] As can be appreciated by those skilled in the art, polyketide biosynthesis can be 
manipulated to make a product other than the product of a naturally occurring PKS biosynthetic 
cluster. For example, AT domains can be altered or replaced to change specificity. For example, 
and not limitation, the AT domain of chalcomycin module 0 (loading domain) can be replaced by 
an AT with specificity for methylmalonyl-CoA to produce chalcomycin derivatives with a C-15 
ethyl group in place of the C-15 methyl group. The variable domains within a module can be 
deleted and or inactivated or replaced with other variable domains found in other modules of the 
same PKS or from another PKS. See e.g., Katz & McDaniel, Med Res Rev 19: 543-558 (1999) 
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and WO 98/49315. Similarly, entire modules can be deleted and/or replaced with other modules 
from the same PKS or another PKS. See e.g., Gokhale et al, Science 284:482 (1999) and WO 
00/47724. For example, and not limitation, 3-hydroxy derivatives of chalcomycin can be 
produced using a modified chalcomycin PKS in which module 7 of the chalcomycin PKS is 
replaced by module 7 of the tylosin PKS (optionally with appropriate linker modifications). 
Similarly, protein subunits of different PKSs also can be mixed and matched to make compounds 
having the desired backbone and modifications. For example, subunits of 1 and 2 (encoding 
modules 1-4) of the pikromycin PKS were combined with the DEBS3 subunit to make a hybrid 
PKS product (see Tang et ah, Science, 287: 640 (2001), WO 00/26349 and WO 99/6159). Also 
see Examples, below. 

[0046] Mutations can be introduced into PKS genes such that polypeptides with altered 
activity are encoded. Polypeptides with "altered activity" include those in which one or more 
domains are inactivated or deleted, or in which a mutation changes the substrate specificity of a 
domain, as well as other alterations in activity. Mutations can be made to the native sequences 
using conventional techniques. The substrates for mutation can be an entire cluster of genes or 
only one or two of them; the substrate for mutation may also be portions of one or more of these 
genes. Techniques for mutation include preparing synthetic oligonucleotides including the 
mutations and inserting the mutated sequence into the gene encoding a PKS subunit using 
restriction endonuclease digestion. (See, e.g. y Kunkel, T.A. Proc Natl Acad Sci USA (1985) 
82:448; Geisselsoder et al BioTechniques (1987) 5:786.) Alternatively, the mutations can be 
effected using a mismatched primer (generally 10-20 nucleotides in length) that hybridizes to the 
native nucleotide sequence (generally cDNA corresponding to the RNA sequence), at a 
temperature below the melting temperature of the mismatched duplex. The primer can be made 
specific by keeping primer length and base composition within relatively narrow limits and by 
keeping the mutant base centrally located. (See Zoller and Smith, Methods in Enzymology (1983) 
100:468). Primer extension is effected using DNA polymerase. The product of the extension 
reaction is cloned, and those clones containing the mutated DNA are selected. Selection can be 
accomplished using the mutant primer as a hybridization probe. The technique is also applicable 
for generating multiple point mutations. (See, e.g., Dalbie-McFarland et al Proc Natl Acad Sci 
USA (1982) 79:6409). PCR mutagenesis can also be used for effecting the desired mutations. 
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[0047] Random mutagenesis of selected portions of the nucleotide sequences encoding 
enzymatic activities can be accomplished by several different techniques known in the art, e.g., 
by inserting an oligonucleotide linker randomly into a plasmid, by irradiation with X-rays or 
ultraviolet light, by incorporating incorrect nucleotides during in vitro DN A synthesis, by error- 
prone PCR mutagenesis, and by preparing synthetic mutants or by damaging plasmid DNA in 
vitro with chemicals. Chemical mutagens include, for example, sodium bisulfite, nitrous acid, 
hydroxylamine, agents which damage or remove bases thereby preventing normal base-pairing 
such as hydrazine or formic acid, analogues of nucleotide precursors such as nitrosoguanidine, 5- 
bromouracil, 2-aminopurine, or acridine intercalating agents such as proflavine, acriflavine, 
quinacrine, and the like. Generally, plasmid DNA or DNA fragments are treated with chemicals, 
transformed into E. coli and propagated as a pool or library of mutant plasmids. 
[0048] In addition to providing mutated forms of regions encoding enzymatic activity, 
regions encoding corresponding activities from different PKS synthases or from different 
locations in the same PKS synthase can be recovered, for example, using PCR techniques with 
appropriate primers. By "corresponding" activity encoding regions is meant those regions 
encoding the same general type of activity - e.g., a ketoreductase activity in one location of a 
gene cluster would "correspond" to a ketoreductase-encoding activity in another location in the 
gene cluster or in a different gene cluster; similarly, a complete reductase cycle could be 
considered corresponding ~ e.g., KR/DH/ER could correspond to KR alone. 
[0049] If replacement of a particular target region in a host polyketide synthase is to be 
made, this replacement can be conducted in vitro using suitable restriction enzymes or can be 
effected in vivo using recombinant techniques involving homologous sequences framing the 
replacement gene. One such system involving plasmids of differing temperature sensitivities is 
described in PCT application WO 96/40968. Another useful method for modifying a PKS gene 
(e.g., making domain substitutions or "swaps") is a RED/ET cloning procedure developed for 
constructing domain swaps or modifications in an expression plasmid without first introducing 
restriction sites. The method is related to ET cloning methods (see, Datansko & Wanner, 2000, 
Proc. Natl. Acad. Sci. U.S.A. 97, 6640-45; Muyrers et al, 2000, Genetic Engineering 22:77-98). 
The RED/ET cloning procedure is used to introduce a unique restriction site in the recipient 
plasmid at the location of the targeted domain. This restriction site is used to subsequently 
linearize the recipient plasmid in a subsequent ET cloning step to introduce the modification. 
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This linearization step is necessary in the absence of a selectable marker, which cannot be used 
for domain substitutions. An advantage of using this method for PKS engineering is that 
restriction sites do not have to be introduced in the recipient plasmid in order to construct the 
swap, which makes it faster and more powerful because boundary junctions can he altered more 
easily. 

[0050] In one embodiment, the invention provides a chimeric PKS in which one of more 
polypeptides are derived from a chalcomycin PKS polypeptide, and one or more peptides are 
derived from one or more non-chalcomycin PKS(s) that, like the chalcomysin PKS, produces a 
16-membered macrolide. Examples of PKS(s) that produces a 16-membered macrolide include, 
for example, the tylosin PKS, the spiramycin PKS, the niddamycin PKS, and the mycinamicin 
PKS. All the currently known PKSs for 16-membered macrolides consists of five large 
polypeptides encoded by colinear genes in a single operon. The arrangement of modules on 
these polypeptides is conserved. Thus, for known 16-membered macrolide PKSs, the first 
polypeptide has a loading module and two extender modules, the second a single extender 
module, the third two extender modules, the fourth a single extender module, and the fifth a 
single extender module followed by a thioesterase domain. The different aglycone core 
structures produced by different 16-membered macrolide PKSs is due to differences in the 
catalytic domains within each of these modules. 

[0051] As is illustrated in the examples, below, new hybrid 16-membered macrolides can be 
made by expressing combinations of PKS polypeptides from different sources in a suitable host. 
The hybrid PKS produces hybrid polyketides that, optionally can be further modified by the 
post-PKS tailoring enzymes present within the host. See Examples, infra. 
[0052] By expressing particular combinations of these PKS polypeptides one can produce 
molecules with desired combinations of structural features based on, for example, the 
macrolactone structural features specified by each of the five polypeptides of different 16- 
membered macrolide PKSs as shown in Table 1, below. As noted, by expressing particular 
combinations of these PKS polypeptides one can produce molecules with desired combinations 
of structural features. Although, as described in the Examples and Table 1, selection of 
particular combinations of polypeptides provides a level of predictability as to the products 
formed by the hybrid PKS, the invention is not limited to any particular combinations or 
structures "predicted" by the table. 
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[0053] In one embodiment, the components of the chimeric PKS are arranged onto 
polypeptides having interpolypeptide linkers that direct the assembly of the polypeptides into the 
functional PKS protein, such that it is not required that the PKS have the same arrangement of 
modules in the polypeptides as observed in natural PKSs. Suitable interpolypeptide linkers to 
join polypeptides and intrapolypeptide linkers to join modules within a polypeptide are described 
in PCT publication WO 00/47724. 

[0054] In one embodiment of the invention, the components of the PKS are arranged into 
five polypeptides similarly to natural PKS proteins involved in the biosynthesis of tylactone, 
platenolide, and the like. Thus, for example, the first polypeptide comprises the loading domain, 
first and second extender modules, and a C-terminal interpolypeptide linker region suitable for 
interaction with the second polypeptide. The second polypeptide comprises an N-terminal 
interpolypeptide linker region suitable for interaction with the first polypeptide, the third 
extender module, and a C-terminal interpolypeptide linker region suitable for interaction with the 
third polypeptide. The third polypeptide comprises an N-terminal interpolypeptide linker region 
suitable for interaction with the second polypeptide, the fourth and fifth extender modules, and a 
C-terminal interpolypeptide linker region suitable for interaction with the fourth polypeptide. 
The fourth polypeptide comprises an N-terminal interpolypeptide linker region suitable for 
interaction with the third polypeptide, the sixth extender module, and a C-terminal 
interpolypeptide linker region suitable for interaction with the fifth polypeptide. The fifth 
polypeptide comprises an N-terminal interpolypeptide linker region suitable for interaction with 
the fourth polypeptide, the seventh extender module, and the terminal thioesterase domain. 
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(0055] In other embodiments of the invention, the components of the PKS residing on any 
given polypeptide are derived from the same source, and are naturally contiguous in that source, 
but the intrapolypeptide linkers are changed to allow proper assembly across heterologous 
polypeptide junctions to form a functional PKS. For example, in one embodiment of the 
invention, the first polypeptide is the intact first polypeptide of the chalcomycin PKS, encoded 
by chmGl, and comprises the loading domain and first and second extender modules from the 
chalcomycin PKS together with the native C-terminal interpolypeptide linker region that directs 
interaction of the first polypeptide with the second polypeptide of the chalcomycin PKS. The 
second polypeptide comprises the N-terminal interpolypeptide linker and module 3 of the 
chalcomycin PKS, encoded by chmGU, but with the C-terminal interpolypeptide linker replaced 
by the C-terminal interpolypeptide linker from the second polypeptide of the spiramycin PKS, 
encoded by srmG2. This replaced C-terminal interpolypeptide linker directs the second 
polypeptide to interact with the third polypeptide, taken from the spiramycin PKS and encoded 
by the srmG3 gene. The remaining polypeptides are the third, fourth, and fifth polypeptides of 
the spiramycin PKS, encoded by srmG3, srmG4, and srmG5, respectively. In another 
embodiment of the invention, the first polypeptide comprises the loading domain and first, 
second and third extender modules from the chalcomycin PKS, together with a C-terminal 
interpolypeptide linker region derived from the C-terminus of the first polypeptide of the tylosin 
PKS. The remaining polypeptides are the third, fourth, and fifth polypeptides of the tylosin PKS. 
The use of the appropriate interpolypeptide linkers directs the proper assembly of the PKS, 
thereby improving the catalytic activity of the resulting hybrid PKS. 

[0056] As noted above, the DNA compounds of the invention can be expressed in host cells 
for production of known and novel compounds. Preferred hosts include fungal systems such as 
yeast and procaryotic hosts, but single cell cultures of, for example, mammalian cells could also 
be used. A variety of methods for heterologous expression of PKS genes and host cells suitable 
for expression of these genes and production of polyketides are described, for example, in U.S. 
Patent Nos. 5,843,718 and 5,830,750; WO 01/31035, WO 01/27306, and WO 02/068613; and 
U.S. patent application nos. 10/087,451 (published as US2002000087451); 60/355,211; and 
60/396,513 (corresponding to published application 20020045220). 
[0057] Appropriate host cells for the expression of the hybrid PKS genes include those 
organisms capable of producing the needed precursors, such as malonyl-CoA, methylmalonyl- 
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CoA, ethylmalonyl-CoA, and methoxymalonyl-ACP, and having phosphopantotheinylation 
systems capable of activating the ACP domains of modular PKSs. See, for example, US Patent 
6,579,695. However, as disclosed in U.S. Patent No. 6,033,883, a wide variety of hosts can be 
used, even though some hosis natively do not contain the appropriate post-translational 
mechanisms to activate the acyl carrier proteins of the synthases. Also see WO 97/13845 and 
WO 98/27203. The host cell may natively produce none, some, or all of the required polyketide 
precursors, and may be genetically engineered so as to produce the required polyketide 
precursors. Such hosts can be modified with the appropriate recombinant enzymes to effect 
these modifications. Suitable host cells include Streptomyces, E. coli, yeast, and other 
procaryotic hosts which use control sequences compatible with Streptomyces spp. Examples of 
suitable hosts that either natively produce modular polyketides or have been engineered so as to 
produce modular polyketides include but are not limited to actinomycetes such as Streptomyces 
coelicolor, Streptomyces venezuelae, Streptomyces fradiae, Streptomyces ambofaciens, and 
Saccharopolyspora eryihraea, eubacteria such as Escherichia coli, myxobacteria such as 
Myxococcus xanthus, and yeasts such as Saccharomyces cerevisiae. 

[0058] In one embodiment, any native modular PKS genes in the host cell have been deleted 
to produce a "clean host," as described in US Patent 5,672,491, incorporated herein by reference. 
The construction of the clean host S.fradiae K159-1, and the clean host S.fradiae K159-1/244- 
17a that produces methoxymalonyl-ACP are described below in Examples 2 and 3. Other 
organisms can be engineered using similar methods. 

[0059] In some embodiments, the host cell expresses, or is engineered to express, a 
polyketide "tailoring" or "modifying" enzyme. Once a PKS product is released, it is subject to 
post-PKS tailoring reactions. These reactions are important for biological activity and for the 
diversity seen among 16-membered macrolides. Tailoring enzymes normally associated with 
polyketide biosynthesis include oxygenases, glycosyl- and methyltransferases, acyltransferases, 
halogenases, cyclases, aminotransferases, and hydroxylases. Tailoring enzymes for modification 
of a product of the chalcomycin PKS, a non-chalcomycin PKS, or a chimeric PKS, can be those 
normally associated with chalcomycin biosynthesis (including, but not limited to, proteins 
described in Table 2) or "heterologous" tailoring enzymes. As noted above, the P450 hydrolases 
encoded by the chmHI, chmPI and chmPII genes are of particular interest for production of 
polyketides having hydroxy groups well suited for subsequent chemical modification. 
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[0060] For purposes of the present invention, tailoring enzymes can be expressed in the 
organism in which they are naturally produced, or as recombinant proteins in heterologous hosts. 
For example, as shown in Examples 6 and 7, a hybrid PKSs having elements from the 
chalcomycin and spiramycin PKSs, or from the iylosin and chalccmycin PKSs were expressed 
in an engineered host derived from a tylosin producing strain of S.fradiae in which all or most of 
the post-PKS tailoring reactions of the tylosin biosynthetic pathway (see Baltz and Seno, 1988, 
"Genetics of Streptomyces fradiae and tylosin biosynthesis" Annu Rev Microbiol 42:547-74) 
were expressed and which modified the polyketide product. 

[0061] In some cases, the structure produced by the heterologous or hybrid PKS may be 
modified with different efficiencies by post-PKS tailoring enzymes from different sources. In 
such cases, post-PKS tailoring enzymes can be recruited from other pathways to obtain the 
desired compound. For example, as discussed in Example 6, a chrnH gene has been used to 
modify the product of a chalcomycin-spiramycin hybrid PKS. Similarly, host cells can be 
selected, or engineered, for expression of a glycosylatation apparatus (discussed below), amide 
synthases, (see, for example, U.S. patent publication 20020045220 "Biosynthesis of Polyketide 
Synthase Substrates"). For example and not limitation, the host cell can contain the desosamine, 
megosamine, and/or mycarose biosynthetic genes, corresponding glycosyl transferase genes, and 
hydroxylase genes (e.g., picK, megK, eryK, megF, and/or eryF). Methods for glycosylating 
polyketides are generally known in the art and can be applied in accordance with the methods of 
the present invention; the glycosylation may be effected intracellularly by providing the 
appropriate glycosylation enzymes or may be effected in vitro using chemical synthetic means as 
described herein and in WO 98/49315, incorporated herein by reference. Glycosylation with 
desosamine, mycarose, and/or megosamine is effected in accordance with the methods of the 
invention in recombinant host cells provided by the invention. Alternatively and as noted, 
glycosylation may be effected intracellularly using endogenous or recombinantly produced 
intracellular glycosylases. In addition, synthetic chemical methods may be employed. 
[0062] Alternatively, the aglycone compounds can be produced in the recombinant host cell, 
and the desired modification (e.g., glycosylation and hydroxylation) steps carried out in vitro 
(e.g., using purified enzymes, isolated from native sources or recombinantly produced) or in 
vivo in a converting cell different from the host cell (e.g., by supplying the converting cell with 
the aglycone). 
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[0063] It will be apparent to the reader that a variety of recombinant vectors can be utilized 
in the practice of aspects of the invention. As used herein, "vector" refers to polynucleotide 
elements that are used to introduce recombinant nucleic acid into cells for either expression or 
replication. Selection and use of such vehicles is routine in the art. An "expression vector" 
includes vectors capable of expressing DNAs that are operatively linked with regulatory 
sequences, such as promoter regions. Thus, an expression vector refers to a recombinant DNA 
or RNA construct, such as a plasmid, a phage, recombinant virus or other vector that, upon 
introduction into an appropriate host cell, results in expression of the cloned DNA. Appropriate 
expression vectors are well known to those of skill in the art and include those that are replicable 
in eukaryotic cells and/or prokaryotic cells and those that remain episomal or those which 
integrate into the host cell genome. 

[0064] The vectors used to perform the various operations to replace the enzymatic activity 
in the host PKS genes or to support mutations in these regions of the host PKS genes may be 
chosen to contain control sequences operably linked to the resulting coding sequences in a 
manner that expression of the coding sequences may be effected in an appropriate host. Suitable 
control sequences include those which function in eucaryotic and procaryotic host cells. If the 
cloning vectors employed to obtain PKS genes encoding derived PKS lack control sequences for 
expression operably linked to the encoding nucleotide sequences, the nucleotide sequences are 
inserted into appropriate expression vectors. This can be done individually, or using a pool of 
isolated encoding nucleotide sequences, which can be inserted into host vectors, the resulting 
vectors transformed or transfected into host cells, and the resulting cells plated out into 
individual colonies. 

[0065] Suitable control sequences for single cell cultures of various types of organisms are 
well known in the art. Control systems for expression in yeast are widely available and are 
routinely used. Control elements include promoters, optionally containing operator sequences, 
and other elements depending on the nature of the host, such as ribosome binding sites. 
Particularly useful promoters for procaryotic hosts include those from PKS gene clusters which 
result in the production of polyketides as secondary metabolites, including those from Type I or 
aromatic (Type II) PKS gene clusters. Examples are act promoters, tern promoters, spiramycin 
promoters, tylosin promoter (e.g., tylGIp, see Rodriguez et al., "Rapid engineering of polyketide 
overproduction by gene transfer to industrially optimized strains" JInd Microbiol BiotechnoL 
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2003 Apr 16; and DeHoff et al., "Streptomyces fradiae tylactone synthase, starter module and 
modules 1-7, (tylG) gene, complete cds" Genbank Accession No. U78289), and other promoters. 
However, other bacterial promoters, such as those derived from sugar metabolizing enzymes, 
such as galactose, lactose (lac) and maltose, are also useful. Additional examples include 
promoters derived from biosynthetic enzymes such as for tryptophan (trp\ the p-lactamase (bid), 
bacteriophage lambda PL, and T5. In addition, synthetic promoters, such as the tac promoter 
(U.S. Patent No. 4,551,433), can be used. 

[0066] As noted, particularly useful control sequences are those which themselves, or with 
suitable regulatory systems, activate expression during transition from growth to stationary phase 
in the vegetative mycelium. The system contained in the plasmid identified as pCK7, i.e., the 
actVactlll promoter pair and the actIl-OKF4 (an activator gene), is particularly preferred. 
Particularly preferred hosts are those which lack their own means for producing polyketides so 
that a cleaner result is obtained. Illustrative control sequences, vectors, and host cells of these 
types include the modified S. coelicolor CH999 and vectors described in PCT publication 
WO 96/40968 and similar strains of S. lividans. See U.S. Patent Nos. 5,672,491; 5,830,750, 
5,843,718; and 6,177,262, each of which is incorporated herein by reference. 
[0067] Other regulatory sequences may also be desirable which allow for regulation of 
expression of the PKS sequences relative to the growth of the host cell. Regulatory sequences 
are known to those of skill in the art, and examples include those which cause the expression of a 
gene to be turned on or off in response to a chemical or physical stimulus, including the presence 
of a regulatory compound. Other types of regulatory elements may also be present in the vector, 
for example, enhancer sequences. 

[0068] Selectable markers can also be included in the recombinant expression vectors. A 
variety of markers are known which are useful in selecting for transformed cell lines and 
generally comprise a gene whose expression confers a selectable phenotype on transformed cells 
when the cells are grown in an appropriate selective medium. Such markers include, for 
example, genes which confer antibiotic resistance or sensitivity to the plasmid. Alternatively, 
several polyketides are naturally colored, and this characteristic provides a built-in marker for 
screening cells successfully transformed by the present constructs. 

[0069] The various PKS nucleotide sequences, or a mixture of such sequences, can be cloned 
into one or more recombinant vectors as individual cassettes, with separate control elements or 
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under the control of a single promoter. The PKS subunits or components can include flanking 
restriction sites to allow for the easy deletion and insertion of other PKS subunits so that hybrid 
or chimeric PKSs can be generated. The design of such restriction sites is known to those of 
skill in the art and can be accomplished using the techniques described above ; such as site- 
directed mutagenesis and PCR. Methods for introducing the recombinant vectors of the present 
invention into suitable hosts are known to those of skill in the art and typically include the use of 
CaCh or other agents, such as divalent cations, lipofection, DMSO, protoplast transformation, 
conjugation, and electroporation. 

[0070] Expression vectors containing nucleotide sequences encoding a variety of PKS 
systems for the production of different polyketides can be transformed into the appropriate host 
cells to construct a polyketide library. In one approach, a mixture of such vectors is transformed 
into the selected host cells and the resulting cells plated into individual colonies and selected for 
successful transformants. Each individual colony has the ability to produce a particular PKS and 
ultimately a particular polyketide. Typically, there will be duplications in some of the colonies; 
the subset of the transformed colonies that contains a different PKS in each member colony can 
be considered the library. Alternatively, the expression vectors can be used individually to 
transform hosts, which transformed hosts are then assembled into a library. A variety of 
strategies might be devised to obtain a multiplicity of colonies each containing a PKS gene 
cluster derived from the naturally occurring host gene cluster so that each colony in the library 
produces a different PKS and ultimately a different polyketide. The number of different 
polyketides that are produced by the library is typically at least four, more typically at least ten, 
and preferably at least 20, more preferably at least 50, reflecting similar numbers of different 
altered PKS gene clusters and PKS gene products. The number of members in the library is 
arbitrarily chosen; however, the degrees of freedom outlined above with respect to the variation 
of starter, extender units, stereochemistry, oxidation state, and chain length is quite large. The 
polyketide producing colonies can be identified and isolated using known techniques and the 
produced polyketides further characterized. The polyketides produced by these colonies can be 
used collectively in a panel to represent a library or may be assessed individually for activity. 
[0071] The libraries can thus be considered at four levels: (1) a multiplicity of colonies each 
with a different PKS encoding sequence encoding a different PKS cluster but all derived from a 
naturally occurring PKS cluster; (2) colonies which contain the proteins that are members of the 
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PKS produced by the coding sequences; (3) the polyketides produced; and (4) compounds 
derived from the polyketides. Of course, combination libraries can also be constructed wherein 
members of a library derived, for example, from the erythromycin PKS can be considered as a 
part of the same library as those derived from, for example, the raparnycin PKS cluster. 
[0072] Colonies in the library are induced to produce the relevant synthases and thus to 
produce the relevant polyketides to obtain a library of candidate polyketides. The polyketides 
secreted into the media can be screened for binding to desired targets, such as receptors, 
signaling proteins, and the like. The supernatants per se can be used for screening, or partial or 
complete purification of the polyketides can first be effected. Typically, such screening methods 
involve detecting the binding of each member of the library to receptor or other target ligand. 
Binding can be detected either directly or through a competition assay. Means to screen such 
libraries for binding are well known in the art. Alternatively, individual polyketide members of 
the library can be tested against a desired target. In this event, screens wherein the biological 
response of the target is measured can be included. 

[0073] Thus, the present invention provides recombinant DNA molecules and vectors 
comprising those recombinant DNA molecules that encode all or a portion of the chalcomycin 
PKS and/or chalcomycin modification enzymes and that, when transformed into a host cell and 
the host cell is cultured under conditions that lead to the expression of said chalcomycin PKS 
and/or modification enzymes, results in the production of polyketides including but not limited to 
chalcomycin and/or analogs or derivatives thereof in useful quantities. The present invention 
also provides recombinant host cells comprising those recombinant vectors. 
[0074] Suitable culture conditions for production of polyketides using the cells of the 
invention will vary according to the host cell and the nature of the polyketide being produced, 
but will be know to those of skill in the art. See, for example, the examples below and WO 
98/27203 "Production of Polyketides in Bacteria and Yeast" and WO 01/83803 "Overproduction 
Hosts for Biosynthesis of Polyketides." 

[0075] The polyketide product produced by host cells of the invention can be recovered (i.e., 
separated from the producing cells and at least partially purified) using routine techniques (e.g., 
extraction from broth followed by chromatography). 

[0076] The compositions, cells and methods of the invention may be directed to the 
preparation of an individual polyketide or a number of polyketides. The polyketide may or may 
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not be novel, but the method of preparation permits a more convenient or alternative method of 
preparing it. It will be understood that the resulting polyketides may be further modified to 
convert them to other useful compounds. For example, an ester linkage may be added to produce 
a "pharmaceuticaily acceptable ester" (i.e., an ester that hydrolyzes under physiologically 
relevant conditions to produce a compound or a salt thereof). Illustrative examples of suitable 
ester groups include but are not limited to formates, acetates, propionates, butyrates, succinates, 
and ethylsuccinates. 

[0077] The polyketide product produced by recombinant cells can be chemically modified in 
a variety of ways. For example , for example by addition of a protecting group, for example to 
produce prodrug forms. A variety of protecting groups are disclosed, for example, in T.H. 
Greene and P.G.M. Wuts, Protective Groups in Organic Synthesis, Third Edition, John Wiley & 
Sons, New York (1999). Prodrugs are in general functional derivatives of the compounds that are 
readily convertible in vivo into the required compound. Conventional procedures for the 
selection and preparation of suitable prodrug derivatives are described, for example, in "Design 
of Prodrugs," H. Bundgaard ed., Elsevier, 1985. 

[0078] Similarly, improvements in water solubility of a polyketide compound can be 
achieved by addition of groups containing solubilizing functionalities to the compound or by 
removal of hydrophobic groups from the compound, so as to decrease the lipophilicity of the 
compound. Typical groups containing solubilizing functionalities include, but are not limited to: 
2-(dimethylaminoethyl)amino, piperidinyl, N-alkylpiperidinyl, hexahydropyranyl, fiirfuryl, 
tetrahydrofurfiiryl, pyrrolidinyl, N-alkylpyrrolidinyl, piperazinylamino, N-alkylpiperazinyl, 
morpholinyl, N-alkylaziridinylmethyl, (l-azabicyclo[1.3.0]hex-l-yl)ethyl, 2-(N- 
methylpyrrolidin-2-yl)ethyl, 2-(4-imidazolyl)ethyl, 2-(l-methyl-4-imidazolyl)ethyl, 2-(l-methyl- 
5-imidazolyl)ethyl, 2-(4-pyridyl)ethyl, and 3-(4-morpholino)-l -propyl. In the case of 
geldanamycin analogs, solubilizing groups can be added by reaction with amines, which results 
in the displacement of the 17-methoxy group by the amine (see, Schnur et al., 1995, "Inhibition 
of the Oncogene Product pl85 erbB ' 2 in Vitro and in Vivo by Geldanamycin and 
Dihydrogeldanamycin Derivatives,", J. Med. Chem. 38, 3806-3812; Schnur et al, 1995 "erbB-2 
Oncogene Inhibition by Geldanamycin Derivatives: Synthesis, Mechanism of Action, and 
Structure- Activity relationships," J. Med. Chem. 38, 3813-3820; Schnur et al, "Ansamycin 
Derivatives as Antioncogene and Anticancer Agents," U.S. Patent 5,932,655; all of which are 
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incorporated herein by reference). Typical amines containing solubilizing functionalities include 
2-(dimethylamino)-ethylamine, 4-aminopiperidine, 4-amino-l-methylpiperidine, 4- 
aminohexahydropyran, furfurylamine, tetrahydrofurfurylamine, 3-(aminomethyl)- 
tetrahydroluran, 2-(amino-methyl)pynoIidine, 2-(aminoinethy!)-l-methy!pyrrolidine. 1- 
methylpiperazine, morpholine, l-methyl-2(aminomethyl)aziridine, l-(2-aminoethyl)-l- 
azabicyclo-[1.3.0]hexane, l-(2-aminoethyl)piperazine, 4-(2-aminoethyl)morpholine, l-(2-amino- 
ethyl)pyrrolidine, 2-(2-aminoethyI)pyridine, 2-fluoroethylamine, 2,2-difluoroethylamine, and the 
like. 

[0079] In addition to post synthesis chemical or biosynthetic modifications, various 
polyketide forms or compositions can be produced, including but not limited to mixtures of 
polyketides, enantiomers, diastereomers, geometrical isomers, polymorphic crystalline forms and 
solvates, and combinations and mixtures thereof can be produced 

[0080] Many other modifications of polyketides produced according to the invention will be 
apparent to those of skill, and can be accomplished using techniques of pharmaceutical 
chemistry. 

[0081] Prior to use the PKS product (whether modified or not) can be formulated for storage, 
stability or administration. For example, the polyketide products can be formulated as a 
"pharmaceutically acceptable salt.'* Suitable pharmaceutically acceptable salts of compounds 
include acid addition salts which may, for example, be formed by mixing a solution of the 
compound with a solution of a pharmaceutically acceptable acid such as hydrochloric acid, 
hydrobromic acid, sulfuric acid, fumaric acid, maleic acid, succinic acid, benzoic acid, acetic 
acid, citric acid, tartaric acid, phosphoric acid, carbonic acid, or the like. Where the compounds 
carry one or more acidic moieties, pharmaceutically acceptable salts may be formed by treatment 
of a solution of the compound with a solution of a pharmaceutically acceptable base, such as 
lithium hydroxide, sodium hydroxide, potassium hydroxide, tetraalkylammonium hydroxide, 
lithium carbonate, sodium carbonate, potassium carbonate, ammonia, alkylamines, or the like. 
[0082] Prior to administration to a mammal the PKS product will be formulated as a 
pharmaceutical composition according to methods well known in the art, e.g., combination with 
a pharmaceutically acceptable carrier. The term "pharmaceutically acceptable carrier" refers to a 
medium that is used to prepare a desired dosage form of a compound. A pharmaceutically 
acceptable carrier can include one or more solvents, diluents, or other liquid vehicles; dispersion 
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or suspension aids; surface active agents; isotonic agents; thickening or emulsifying agents; 
preservatives; solid binders; lubricants; and the like. Remington's Pharmaceutical Sciences, 
Fifteenth Edition, E.W. Martin (Mack Publishing Co., Easton, PA, 1975) and Handbook of 
Pharmaceutical Excipients, Third Edition, A.H. Kibbe ed. (American Pharmaceutical Assoc. 
2000), disclose various carriers used in formulating pharmaceutical compositions and known 
techniques for the preparation thereof. 

[0083] The composition may be administered in any suitable form such as solid, semisolid, 
or liquid form. See Pharmaceutical Dosage Forms and Drug Delivery Systems, 5 th edition, 
Lippicott Williams & Wilkins (1991). In an embodiment, for illustration and not limitation, the 
polyketide is combined in admixture with an organic or inorganic carrier or excipient suitable for 
external, enteral, or parenteral application. The active ingredient may be compounded, for 
example, with the usual non-toxic, pharmaceutical^ acceptable carriers for tablets, pellets, 
capsules, suppositories, pessaries, solutions, emulsions, suspensions, and any other form suitable 
for use. The carriers that can be used include water, glucose, lactose, gum acacia, gelatin, 
mannitol, starch paste, magnesium trisilicate, talc, corn starch, keratin, colloidal silica, potato 
starch, urea, and other carriers suitable for use in manufacturing preparations, in solid, semi- 
solid, or liquified form. In addition, auxiliary stabilizing, thickening, and coloring agents and 
perfumes may be used. 

[0084] It will be apparent from the forgoing that the invention provides may useful 
compositions and methods of using them. Without intending to limit its scope, in one aspect, the 
invention provides a recombinant DNA molecule that encodes a polypeptide, module or domain 
derived from a chalcomycin polyketide synthase (PKS) gene cluster. In an embodiment, the 
DNA molecule (or its complement) has substantial sequence identity to SEQ ID NO:l . In an 
embodiment, the DNA molecule hybridizes under stringent conditions to a nucleic acid having 
the sequence of SEQ ID NO:l or its complement. In a related aspect, the invention provides a 
recombinant DNA molecule, comprising a sequence of at least about 200, optionally at least 
about 500, basepairs with a sequence identical or substantially identical to a protein encoding 
region of SEQ ID NO:l . In an embodiment, the DNA molecule encodes a polypeptide, module 
or domain derived from a chalcomycin polyketide synthase (PKS) gene cluster. 
[0085] In one embodiment, the recombinant DNA molecule comprises a sequence encoding 
at least one module of a chalcomycin polyketide synthase. In an embodiment, the recombinant 
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DNA molecule encodes a chalcomycin polyketide synthase polypeptide selected from the group 
consisting of ChmGI, ChmGII, ChmGIII, ChmGIV, and ChmV. 

[0086] In one aspect, the recombinant DNA molecule includes a coding sequence for a 
chalcomycin modifying enzyme, such as a chalcomycin P450 hydrolase enzyme selected from 
the group consisting of ChmHI, ChmPI, and ChmPII. 

[0087] The invention also provides vector that comprise the recombinant DNA molecules of 
the invention. In an aspect, the invention provides a recombinant host cell comprising the vector. 
In a related aspect, the invention provides a recombinant host cell comprising a DNA molecule 
of the invention integrated into the cell chromosomal DNA. 

[0088] Also provided is a chimeric PKS that comprises at least one domain of a chalcomycin 
PKS, and a cell containing such a chimeric PKS. In a related aspect, the invention provides a 
modified functional chalcomycin PKS that differs from the S. bikiniensis chalcomycin PKS by 
the inactivation of at least one domain of the chalcomycin PKS and/or addition of at least one 
domain of a non-chalcomycin PKS (for example, a loading domain, a thioesterase domain, an 
AT domain, a KS domain, an ACP domain, a KR domain, a DH domain, and an ER domain). 
The invention provides a cell comprising a modified functional PKS. The invention also 
provides a method to prepare an chalcomycin derivative which method comprises providing 
substrates including extender units to the cell. 

[0089] In an aspect, the invention provides a recombinant expression system capable of 
producing a chalcomycin synthase domain in a host cell, said system comprising an encoding 
sequence for a chalcomycin polyketide synthase domain, and said encoding sequence being 
operably linked to control sequences effective in said cell to produce RNA that is translated into 
said domain, and a host cell modified to contain the recombinant expression system. 
[0090] The invention provides an isolated polypeptide encoded by a recombinant 
chalcomycin polyketide synthase (PKS) gene, and a recombinant host cell containing or 
expressing such a polypeptide. 

[0091] The invention also provides a recombinant S. bikiniensis cell in which a chmGl, 
chmGW, chmGUl, chmGW, or chmV is disrupted so as to reduce or eliminate production of 
chalcomycin. 

[0092] The invention also provides a recombinant DNA molecule encoding a first protein 
comprising one or more modules of a chalcomycin PKS and a second protein comprising one or 
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more modules of a tylosin PKS or spiramycin PKS, optionally one or more polypeptides of a 
chalcomycin PKS and one or more polypeptides of a tylosin PKS or spiramycin PKS. In a 
related aspect, the invention provides a recombinant host cell comprising a hybrid polyketide 
synthase comprising one or mure modules of a chalcomycin PKS and one or more modules of a 
tylosin PKS or spiramycin PKS. 



EXAMPLES 

[0093] The following examples are provided to illustrate, but are not intended to limit, the 
present invention. 

Example 1 Isolation and Characterization of Chalcomycin PKS Cluster From Streptomvces 
bikiniensis NRRL 2737 . 

[0094] Growth of organism and extraction of genomic DNA. For genomic DNA extraction, a 
spore stock of Streptomyces bikiniensis NRRL 2737 was used to prepare a seed culture. Spores 
were stored as a suspension in 25% (v/v) glycerol at -80°C and used to inoculate 5 ml of 
unbuffered Trypticase Soy Broth (TSB) liquid media. The entire seed culture was transferred 
into 50 ml of the same growth medium in a 250 ml baffled Erlenmeyer flask and incubated with 
shaking for 24 h at 28°C. A 10 ml portion of the cell suspension was centrifuged (10,000 x g) 
and the resulting pellet was washed once with 10 ml buffer 1 (Tris, 50 mM, pH7.5; 20 mM 
EDTA). The pellet was suspended in 3.5 ml of buffer 1 containing 150 \ig/m\ RNase (Sigma- 
Aldrich) and 1 mg/ml lysozyme. After incubation of the mixture at 37°C for 30 min, the salt 
concentration was adjusted by adding 850 |il 5 M NaCl solution, then the mixture was extracted 
two times with phenol:chloroform:isoamylaclohol (25:24:1, vol/vol) with gentle agitation 
followed by centrifugation for 10 min at 3500 x g. After precipitation with 1 vol of isopropanol, 
the genomic DNA knot was spooled on a glass rod and redissolved in 200 \il of water. This 
method yielded about 0.5 mg total DNA. Standard agarose gel electrophoresis using 0.7% 
Seakem® LE- Agarose (BioWhiaker Molecular Applications, Rockland, ME) at a voltage of 50 
mV overnight revealed that the sample contained mainly high molecular weight DNA of 50 kb 
or greater. 
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[0095] PKS Probe design . Five degenerate PCR primers were designed (degKSIF 5'- 
TTCGAY SCSGVSTTCTTCGSAT-3' [SEQ ID NO:44]; degKS2F 5'- 
GCSATGGAYCCSCARCARCGSVT-3' [SEQ ID NO:45]; degKS3F 5'- 
SSCTSGTSGCSMTSCAYUWSGCo [SEQ ID NO:46]; degKSSR 5'- 
GTSCCSGTSCCRTGSSCYTCSAC-3' [SEQ ID NO:47]; degKS7R 5'- 
ASRTGSGCRTTSGTSCCSSWSA-3' [SEQ ID NO:48]) based on conserved regions of 
ketosynthase (KS) domains of type I PKS genes and codon bias of high G+C organisms. The 
primers were used in the following combinations: degKSlF/degKS5R, degKS2F/degKS5R and 
degKS3F/degKS7R. The PCR conditions for the amplification of KS domains were as follows: 
A total reaction volume of 50fxl contained 100 ng of S. bikiniensis total DNA, 200 pmol of each 
primer, 0.2mM dNTP, 10% DMSO and 2.5 U Taq DNA polymerase (Roche Applied Science, 
Indianapolis, In). Thirty-five cycles of PCR were performed using the following steps: 
denaturation (94°C; 40 sec); annealing (55°C; 30 sec); extension (72°C; 60 sec), 35 cycles. The 
resulting PCR reactions were subjected to electrophoresis on 1% agarose gels and the PCR 
products of approximately 700 bp were extracted from the gels using the gel extraction kit from 
Quiagen (Valencia, CA) according to manufacturer's protocol. The fragments were treated with 
Pfu DNA Polymerase (Stratagene, La Jolla, Ca) to remove the A overhangs and cloned into the 
plasmid vector pLitmus28 (New England Biolabs, Beverley, Ma) cut with EcKV. Thirty-two 
"amplimers" (the ca. 700 bp PCR-amplified segment) for each primer pair were sequenced using 
standard protocols. Of the 96 inserts sequenced, 81 were found to be KS amplimers. Employing 
the sequence comparison program ClustalW, 15 of the 81 KS amplimers were found to be unique 
and were compared with the 8 KS sequences of the related tylosin PKS cluster of Streptomyces 
fradiae using the program ClustalW. Each KS amplimer was thus assigned to a particular KS 
within the putative chalcomycin PKS cluster. 

[0096] Genomic library preparation . Approximately 10 \i% of genomic DNA was partially 
digested with Sau3A\ (1 hr incubation using dilutions of the enzyme) and the digested DNA was 
run on an agarose gel with DNA standards. One of the conditions used was found to have 
generated fragments of size 35-47 kb. The DNA from this digestion was ligated with 
pSuperKos plasmid, a derivative of pSuperCos (Stratagene) digested with Afel and self-ligated to 
eliminate the neo marker, pre-linearized with BamHl and Xbal and the ligation mixture was 
packaged using a Gigapack XIII (Stragene) in vitro packaging Kit and the mixture was 
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subsequently used for infection of Escherichia coli DH5a employing protocols supplied by the 
manufacturer . Approximately 2000 of E.coli transductants were probed by in-situ colony 
hybridization with DIG labeled Sb3/7-31 (KS q ), Sbl/5-75 (KS3) and and Sbl/5-78 (KS7). 
Plasmids from 15 colonies, which showed strong hybridization signals were isolated, digested 
with BamHl and subjected to Southern blotting employing the KS q or KS7 amplimers as probes. 
Ten plasmids showed strong hybridization with one or both amplimers. The ends of the insert in 
each of the 10 plasmids were sequenced using convergent primers for each (T7 promoter and T3 
promoter). Two cosmids, pKOS146.185.1 and pKOS146.185.10 were found to possess high 
homology at one end with a segment of the PKS from the tylosin biosynthesis cluster. These two 
plasmids also each gave rise to DNA fragments of ca. 1 kb and 1 .2 kb after ZtamHI-digestion. 
[0097] Identification of chalcomycin biosvnthetic gene cluster . Further verification that 
cosmids pKOS146.185.1 and pKOS146.185.10 contained the chalcomycin biosynthesis cluster 
was performed by PCR. Specific primer pairs were designed for the chalcomycin KSq (Sb3/7-31 
forward 5'-CGTCAGCCTGATCCTCGCCGA-3' [SEQ ID NO:49]; reverse 5'- 
TCCAGGTGGCCGACGTTC GTC) [SEQ ID NO:50], KS3 (Sbl/5-75 forward 5'- 
AACGAGATCCCGCCGGG CCTC-3' [SEQ ID NO:51]; reverse 5'-ATCA 
CGCGTTGCTGGGCGAGG-3 ' [SEQ ID NO:52]) and KS7 (Sbl/5-78 forward 5'- 
GGACGTCTGCCGGAGG GTTCC-3'tSEQ ID NO:53]; reverse 5'- 

GGCCCGTTGGGC ACGGAC AGA-3 ' [SEQ ID NO:54]) amplimers and used in PCR reactions 
with each of the 8 KS amplimers using the following conditions: total reaction volume of 50^1 
contained 20-100 ng of plasmid DNA containing an amplimer, lOOpmol of each primer, 0.2mM 
dNTP,10% DMSO and 2.5 U Taq DNA polymerase. Cycle steps were as follows: denaturation 
(94°C; 40 sec), annealing (55°C for KSq and KS3 specific primers, 65°C for KS5 specific 
primers; 30 sec), extension (72°C; 60 sec), 25 cycles. Each primer set was found to amplify its 
cognate amplimer exclusively, with the exception of the primer set for KS7, which was also seen 
to give a small amount of amplification of non-cognate amplimers. Each primer set was then 
used for PCR with cosmids pKOS146.185.1, pKOS146.185.10 and pKOS146.185.1 1 employing 
the same conditions as described above but using cosmid DNA in place of the plasmid- 
containing amplimer DNA. pKOS146.185.1 gave correctly sized amplimers with KSq and KS3 
primers but not with KS7 specific primers, whereas pKOS 146-185.10 gave a correctly sized 
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amplimer with KS7 but not with KSq and KS3 specific primers, indicating that pKOS 146. 185.1 
contained the 5* region of the chalcomycin PKS genes. 

(0098] The sequence of the insert of pKOS 146. 185.1 corresponds to bases 1 to 48,595 of 
SEQ ID No.l and the sequence of ihc insert of pKQS146.185.!0 corresponds to bases 44,218 to 
85,915 of SEQ ID No.l. Table 2 below provides open reading frame (ORF) boundaries 
corresponding to the nucleotide position in SEQ ID NO :l (Table 3) of the chalcomycin PKS as 
well as the nucleotide sequences encoding enzymes involved in precursor synthesis and 
chalcomycin modification. In addition to the ORFs listed in Table 2, SEQ ED NO:l includes 
additional open reading frames of genes encoding proteins or domains thereof that may be useful 
in the biosynthesis of chalcomycin and/or analogs thereof in certain host cells. The various open 
reading frames, module-coding sequences, and domain encoding sequences shown in Table 2 
and the figures are sometimes referred to as "subsequences." Those of skill in the art will 
recognize, upon consideration of the sequence shown in Example 1, that the actual start locations 
of several of the genes could differ from the start locations shown in the table, for example due 
to the presence in-frame of codons utilizable by the initiator methionine tRNA in close proximity 
to the codon indicated as the start codon. The actual start codon can be confirmed by amino acid 
sequencing of the proteins expressed from the genes. 



34 



o 

z 

Q 

o 

w 

co 



CO 
03 

| 

CO 
03 

d 
Z 



1 

00 



03 

a 

CUD 
W) 

a 

o 

e 

S 

2 

e z 

1 s 

U w 
o 



00 

o 



o 

§ 

o 

§ 

o> 

CO 

O 

o 



3 

CI 

H 



G 

03 



<D 

f 

O 

o 
x> 

X3 

03 
73 

O 

o 
G 

03 

b 



03 

i 

2 



CO 
03 

"S 
§ 

o 
42 



6 

z 

Q 

a 
w 



CO 

ro 



U 



jo 
13 

C3 

o 
o 

CO 

o 

00 

• 

I 

G 

"<3 
+■» 
o 

IH 
Oh 

Hi 



o 



o 
o 



O 

Z 

a 

a 
w 

00 



o 

oo 

CM 



o 
on 

CN 



U 

CO 



o 
o 

13 

G o 
03 . 

£ co 
a ° 

o o 

£} S 
t2 2 

o e 



CN 

4h 



co 

ON 



CN 



o 
z 

Q 

o 
w 

CO 



co 

ON 
CN 



CO 

03 

i 



o 

GO 

13 

B 
o 

S3 < 



si- 

C T3 00 

o o S 



o 



00 
OO 
CO 



oo 
o 
o 

CO 



l/N 

d 
z 

Q 

a 
w 

CO 



o 



U 



e? 

w 

>> 

I 

03 

CO 

o 
o 

13 
M 
o 



03 
CO 

**-> 

>> 00 
JG O 

■91 

o 

co j3 



oo 
o 

CN 

to 



On 
On 
CO 



d 
z 

Q 

alacy 



CO 

a> 

CO 

a 
.id 

CO 



I 
s 

I 

O 

03 

CO 

O 

o 

o 



oo 



U 



oo 
o 

I— I 

o 

£ 
o 

> 

U 
>> 

W 

03 

CO 

O 

o 
13 

o 



oo 

d 
z 

a 



CO 



CO 

oo 



o 



03 

3 
is 

8 

CO 

*co 
O 
O 



oo SP 
o 



o 

£ 
o 

% 

W 

CO 

<D 

Q 



G 

£ 
o 

13 
o 



CN 

o 



ON 

d d 
z z 

aa 



o 
'o 
£ 

o 

a 



a 

'S 
o 

I 

CO 

u 

CN 

u 

o 

Ph 



io 



U 



03 

CO 

S3 

8 
ft 

r 

0O 

u 

o 

0h 



o 
z 

a 

o 

w 

CO 



CN 
00 
CN 



1 S 

Ok ^ 

I si 

03 C 



P CO 

pG CO 

O 03 

■£ o 

CO 

O V! 



o «o 
o O 

5 0 



co 



CN 
CN 



CN 

d 

a 



u 



o 

C3 

H 03 

03 
I 

03 
CO 

O 
O 

i 

Q 

H 



O 
Z 

a 

a 
w 

CO 



^o 

d 
z 

a 

a 
w 

CO 



d 
z 

a 

OCAO 



00 

o 

£ 
o 

rG 

H 

>. 

+-» 
cd 
Cut 



oT 
i 

CO 



o 

CO 

03 
CO 

O 
G 

'& 
s 

i 

Q 



O 

a 



o 

CN 



6 
z 

a 

a 

w 

CO 



o 

CN 

g 

a 



00 

o 

o 

£ 
o 

& 

H 



v© 

CN 
CO 



00 

15 

E 
o 

JG 
H 

03 
CO 

^ ! 

2 

1 

o 

CN 

u 

o 

lO 



oo ; 

o ,1 



o 

03 

b 

03 



z 

u 

E 



as 

CN 
O 

o 

CN 
I 

ON 

in 
On 
oo 



On 
CN 
O 

o 

CN 



CN 

d 
z 

a 



da 

■ w 

CO 



CO 

o 



00 



00 

,2 
"o 

£ 
o 

43 

w 

H 
H 

I 

X 
O 

CN 

>J 

1 

a, 

03 

CO 

o 

G 



00 

£ 
o 



b 

03 

CO 
CO 



CO 

O 
o 

03 
co 
O 



Q 



VO 

r- 

CN 
CN 



ON 
CN 
CO 

CN 



to 

CO 

































































VO 






co 






































Co 1 
























CN 


























CN 












CN 




















CN 


CN 




o 


























6 












o 




















NO: 


>IDNC 






























Ah 












Ah 






















B 


























« 
































Q 




a 






































a 




















0 






w 


























w 












n 




















w 


w 






























CO 












00 




















CO 










































1 I 
































































00 




















CN 


o 










































00 






















00 






























ON 
































VO 
































i—^ 












CO 




















r— • 


u 








































































G 






















































































































1 






ain 


G 


a 

o 




























































a 






















ain 












ain 










G 




















o 


a 


00 








lain 




























'3 














< 








o 


G 

• Y-H 










.a 


G 




a 






G 


G 




a 






G 




a 




G 


.a 


a 








"SB 




OJ) 


ase loading 






"3 


se 1 domain 






*3 






o 






'3 


se 3 domain 


o 




e 4 domain 


*s 


a 


0 


e 5 domain 


'3 




se 5 domain 




nt; 23S-rR] 


rB homolo; 




e Oq loadin 


protein loa 


e 1 domain 


rase 1 dom 


protein doi 


e 2 domain 


rase 2 dom 


ase 2 doma 


se 2 domaii 


protein 2 d 




e 3 domain 


rase 3 dom 


ase 3 doma 


protein 3 d 


wo 


rase 4 dom 


se 4 domaii 


protein 4 d 


rase 5 dom 


ase 5 doma 


ase 5 doma 




a 


H 




Cd 


J? 


<n 


CO 

cd 


^t-H 


■ cd 








hydrogen 


cd 


V-t 




CO 

cd 




G 


cd 
+-> 


Vh 

OJ 


x> 






cd 




a 


Vh 


G 


0 


cd 
+-* 




letermii 


ferase ( 


les 0-2 


CO 

2 


yltransi 


yl carri* 


tosynth 


yl trans 


toreduc 


yl carri* 


& 

CO 

O 

•4-* 


yl trans 


toreduc 


yl carri* 


CO 


42 

a 

CO 

O 


yl trans 


hydrog* 


toreduc 


yl carri* 


les 4 an 


1 

§ 


yl trans 


toreduc 


yl carri* 


43 
+-> 

CO 

O 


yl trans 


hydrog* 


oylredu 


toreduc 


VO 




CO 






o 


o 


a? 


o 




o 




o 






o 






o 


<u 


d> 


o 






0 


a> 


0 


"S 


0 




G 




*3 


O 


an 


o 




< 






< 




< 




< 


Q 




< 


O 




< 


Q 




< 


O 




<! 




< 




< 


P 


w 




O 


G 






00 


00 


00 


00 


00 


00 


00 


00 


00 


CO 


00 


00 


s 


00 


oo 


en 


00 


00 


S 


00 


00 


00 


CO 


00 


00 


CO 


CO 


CO 




esista 






*i 










M 








fen 




&H 




















fen 








oo 


Ph 


Ph 


Ph 


Ph 


Ph 


PU 


Ph 




Ph 


Ph 


Ph 


Ph 




Ph 


Ph 


Ph 


pu 


PU 




Ph 


Ph 


Ph 


Ph 


Oh 


Ph 


Ph 


Pu 


Ph 


CO 






























































s? 




6 


Ph 


























Ah 
































Ph 






I— ( 


























t-H 

o 

cj 












iGIII 




















E> 






o 


cr 




o 


















CN 










CO 




















0 


i 




O 


o 


Oh 




i— ■ 






CN 


CN 


CN 




Put 


CO 


CO 


CO 


CO 


Ph 


tJ- 






Oh 


to 


to 


ir> 


to 






I 






U 


00 






u 


00 




w 




u 


5 


00 


H 


a 




V 




00 






u 


CO 
















o 






<: 








< 




s< 


Q 




< 


o 






Q 




< 


O 




% 




< 






O 


w 






wo 




r- 






































OO 
OO 
CO 




















r- 


ON 




ON 
VO 

co 


VO 


VO 


VO 


On 


CN 




o 




CN 






On 


to 




On 






oo 


CN 


VO 


00 


to 




CN 






00 










on 


to 




CO 


ON 


CN 


VO 


CO 


r- 


m 


to 


VO 




VO 




oo 


VO 


CN 


ON 






CN 


r- 


O 


to 


CN 


CN 


CN 




o. 


CO 




o 


^ 




CN 


wo 


ON 


uo 


CN 


VO 


On 




00 


to 


CN 


VO 






CN 


to 


On 


CN 


O 


ON 


l> 


On 




wo 


VO 


VO 


oo 


ON 


o 




CN 


CO 




VO 


VO 


CN 


00 


On 


o 


CN 


CN 


1 




to 






00 


0 




CN 


CO 


to 


1 




1 


CN 
i 


CN 
1 


CN 
i 


CN 
i 


CN 
i 


CO 

i 


CO 


CO 


CO 


co 


CO 
1 


CO 
1 




CO 


CO 
1 


i 


i 


1 






1 




1 


10 

1 


to 
1 


to 


to 

1 


1 


co 




CN 


CO 


CO 


ON 


00 


VO 


On 


1 

VO 


1 

VO 


1 

ON 


i 

to 


CN 


CN 




CO 




CO 


CN 




CN 


<7n 


0 


VO 


00 


1— H 


VO 


ON 


to 


to 


ON 


to 




CN 


CN 


wo 


ON 


o 


oo 


On 


VO 


ON 


OO 




r- 


o 






CN 


o 






CN 


CO 


to 


CO 






CN 


O 


00 


CN 


CO 


VO 




VO 


00 


CO 




oo 


CO 


o 


ON 


CN 


00 


ON 


^1- 




O 






ON 






O 








co 


VO 


CN 


CO 


0 


On 




CN 




to 


CO 


wo 


VO 


VO 


00 


o 


o 




CN 


CO 




VO 






00 


ON 


r— t 


CN 


co 


CO 




VO 




r- 


On 


O 


CN 


CN 




CN 




CN 


CN 


CN 


CN 


CN 


CN 


CO 


CO 


CO 


CO 


CO 


CO 


CO 


CO 


CO 


CO 


CO 




















uo 


W0 




wo 



VO 
CO 











ts 

6 
Z 

a 

a 
w 

o 
</■> 

»— i 










412 [SEQ ID NO: 28] 


425 [SEQ ID NO: 29] 


o 

£ 

H 

a 
w 

00 

oo 

CN 


382 [SEQ ID NO: 31] 


487 [SEQ ID NO: 32] 


231 [SEQ ID NO: 33] 


386 [SEQ ID NO: 34] 


228 [SEQ ID NO: 35] 


476 [SEQ ID NO: 36] 


[SEQ ID NO: 37] 


612 [SEQ ID NO: 38] 


223 [SEQ ID NO: 39] 


251 [SEQ ID NO: 40] 


[SEQ ID NO: 41] 


240 [SEQ ID NO: 42] 


[SEQ ID NO: 43] 


























U 


O 


U 


U 


U 


U 


U 






U 


O 






PKS Ketosynthase 6 domain 


PKS Acyl transferase 6 domain 


PKS Ketoreductase 6 domain 


PKS Acyl carrier protein 6 domain 


PKS, Module 7 and TE 


PKS Ketosynthase 7 domain 


PKS Acyl transferase 7 domain 


PKS Acyl carrier protein 7 domain 


Thioesterase 


NDP - hexose 3,4-isomerase; D-chalcose pathway 
component (EryCII homolog) 


Chalcose glycosyltransferase (EryCIII homolog) 


Post PKS Ketoreductase (SimJ2, NovJ homolog) 


Permease homolog (SCF6.09 homolog) 


Membrane protein homolog (SC66T3.03 homolog) 


D-alanyl-D-alanine carboxypeptidase homolog 
. XSCD6. 17c homolog) 


Sensory histidine kinase homolog (SCE94.10 
homolog) 


Two-component syst. response regulator homolog 
(SCE94.09 homolog) 


permease (xanthine/uracil permease type) (SC9G1.02, 
SC9G1. 04 homolog) 


SC6A1 1.03c Homolog 


Permease homolog (SC9G1.02, SC9G1.04 homolog) 


MerR-family transcriptional regulator (SCI A4.06c 
homolog) 


Type-II thioesterase (SanP homolog) 


Open reading frame 


Response regulator homolog (SCD49.02c homolog) 


Open reading frame 
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54544-55818 


56122-57168 


57844-58707 


58753-59019 


59387 - 63439 


59489-60778 


61112-62209 


62276-62533 


62549-63436 


63522 - 64760 


64804-66081 


66194-66940 


67323 - 68471 


68733 - 70196 


70193 -70888 


71382-72542 


72638 - 73324 


73651 -75081 


75401 -76117 


76537 - 78375 


78521 -79192 


79228 - 79983 


80489-81439 


81806-82528 


82712-85912 



[0099] Genes listed in Table 2 that encode proteins with post-PKS polyketide-modifying 

activities include: chmPI, chmPII, chmHI (P450 homologs), chmN, chmCIII 
(glycosyltransferases) and chmU (polyketide ketoreductase). 

[0100] Genes listed in Table 2 that encode proteins predicted to participate in the 

biosynthesis of sugar residue subunits of chalcomycin or modification of sugar residues after 
their addition to the polyketide include: chmCIV, chmMIII, chmCV, chmAII, chmAI, chmJ, 
chmMII, chmD, chmMI, and chmCII. Of these, three are predicted to participate in D-chalcose 
residue biosynthesis (ChmCII, ChmCIV and ChmCV), two are predicted to participate in D- 
allose residue biosynthesis (ChmD and ChmJ) two are predicted to participate in conversion of 
the D-allose residue to D-mycinose residue after covalent linkage of the D-allose to the 
polyketide (ChmMI and ChniMII), two are predicted to provide precursors for both the allose 
and chalcose pathways (ChmAI and ChmAII), and one is predicted to O-methylate the chalcose 
residue (ChmMIII). 

[0101] As noted above, the invention also provides inter-polypeptide linker sequences, which 
can be identified by the skilled reader (e.g., comprinsing the sequences between the N-terminus 
of the polypeptide and the beginning of the first KS domain; or between the C-terminue of the 
polypeptide and the beginning of the last ACP domain) and polynucleotides encoding such 
linkers. 



Table 3 



Chalcomyin PKS cluster from Streptomyces bikiniensis NRRL 2737 (SEP ID N0:1) 



1 GGGGCCCGCC GGACGGGGCT GCCCGGCTCT CGGCGGTGCC CGGTGGGCCG GGTGCGGGCT 

61 CGCCCGCGGC GAGATGCTCC AGGACCTCCG CCAGTTCCCG GCAGGCGCGG CGTACCGAGC 

121 GGCGGGTCGC ACGCTGCTCC GTGATGACGG AGGCGAGAAG CAGGGCGGTC AGTGCGGCGG 

181 AACCGTTGAA CGCCTGGAGC TTGGCCATGA TCTCGACGTC CGAGAGGTGG AGGAACCCTC 

241 CCCGTCCGGC GTTCGCCTCG AAGGTGGCGA GCACGGAGGC GAAGAGCGCG CAGAGCATGC 

301 TTCCGGTGAG CTGGAAGCGG AGCGCCGCCC AGATCAGCAG GGGGAAGACG AGGAAGAGCA 

361 TGCCCACCGG GCTGAGCACG GCCATGGGCA TGAGGATCAG GGTCGCGAGA CCCAGCAGGG 

421 CCGCCTCCTT CCAGCGCCGT ACGCGGAACC GTCCGGCCGG CCCCGCGAGG ACGAGGAGGA 

481 GCGGGGCGAC GAGCAGCACC CCCATCGTGT CGCCCACCCA CCAGGCCAGC CAGACGGGCC 

541 AGAACTCGGT CGTGTCCAGG GAGCTCTTCG CCACCTGCAG TCCGACCCCG GCGGTCGCGC 

601 TGATCAGCAT GGCGCCGAAC CCGCCGAGGA AGACCAGGGA GAGTCCGTCC CGCAGCCGTG 

661 CCATGTCGAG CCGGAAGCCG GCCCGTGTCA GCAGCAGGAA GGCGCAGAGC GGCGCGACGG 

721 TGTTGCTGAC CACGGTGACC ACGGTGGTGG GCCCCGGCGT GGTGAGGGAG GCGATGACGA 

781 GGAAGGAGCC GAGGGCGATC CCGGGCCAGA CGCGCGCGCC GAGCAGCAGC AGGGCGGCGA 

841 CGGCGACGCC GGTGGGAGGC CAGATGGGGG TGACCACCAC GCCTTCGACG ACGAGGCGGC 

901 CCATCAGGCC GAGTCGTCCG GCCGCGTAGT AGAGCACCGC CACGGCCAGC GACATCAGCG 

961 CCGTCGCCGC GGGGGACCGG TACTGCCGAA TATCCAACAC GTCTGCCATC AGACACCGAC 
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1021 CGGGTTGCCG CGCCCCTGCA TGGCCGCCTG CGTGTCGGGG ACACGCCGCC CGGCCCGGAC 
1081 CGGGCCGCGG GGAGGACCGG CCCGAGGCGG GTCCCCTCGC TCACTCCCCG GTCGCCGGCC 
1141 GGGGCGGACC GCCGTCGTGG CCGACGACGA GGACGGCCGC GTCGTCCTCG TGCCCCACCG 
1201 TGGCCGCCAG TCGCATGACG GCGGTGGCGA GCGCGTCGAC GTCCAGCCCG GCGACGGCCG 
1261 TGATCCCGGC GAGTCTCGTC ACCCGGTCGA GACCCTCGTC GATGTGGAGC GAGGGCCCCT 
1321 CCACGACCCC GTCGGTCAGG AGGACGAAGA CGCCGTCGGC GGTGAGCCGG TGGCGCGTGG 
1381 CCGGGTAGTC GGCGCCGCGC AGCACGCCGA GAGGAGGCCC GCCCTCGCTG TCCTCGATGC 
1441 CGGAGCGGCC GTCGGCGGTG GCCCAGATGT GGGGGATGTG GCCGGCCCGG GCGCATTCCA 
1501 GGGTGCCGGC GGCGGGGTCG AGCCGCAGGA AGGTGCAGGT GGCGAAGAGG TCGGCGCCGA 
1561 GGGAGACGAG CAGGTCGTTG GTGCGGCCGA GCAGCTCTCC CGGGTCCGCG GTGACGGAGG 
1621 CCAGCGCGCG CAGTGCCACG CGGACCTGTC CCATGAAGGC GGCGGCCTCG ATGTTGTGTC 
1681 CCTGGACGTC GCCGATGGAC ATGCCGATCC GCCCGCCAGG CAGGGGGAAG GCGTCGTACC 
1741 AGTCGCCGCC GACGTTGAGC CCGTGGTTGG CGGGCTCGTA CCGGACGGCC AGCCGGGCAC 
1801 CCGGAAGACT GGGCAGGTCC GAGGGCAGCA TGCCGCGCTG CAGGGCCACG GCGAGCTCGA 
1861 CACGGGACCG CTGCGTCTCG GCCAGCTCCC GCGCCCTGGC GGTCAGCGAC CCGAGCCGGA 
1921 CCAGAAGGTC CTCGCTGCTG TCGGCCGGGC GTCTGCGGAA CATCGATCAC TCCGACGTCA 
1981 CGACAATCCT CGCATCACTC CGTCCCGTCT CCAGCACGCG GGACCACAGG GGACCACCCC 
2041 GGTACGAACA GGTCCTTCCC ACTGTGCCCG GAGGGGGCGG GGTCCGCATC TCATCGGCGG 
2101 GAGAGCGCGG TGGATCCCAG GGGGCCCGCT CAGGTCACCG AAAACGAGCA AACGTTCGAT 
2161 AATGTGGTCG CGCCGGTCTG TGCGGCCGTT CAGCGTTCGA CGGTCACTGC GGCGGCCGCG 
2221 ATGCCGTCGC GCACCAGCCA CCGGCCGGTG AACCCGCCCA GCTGCCGTCC GTCGACCGTC 
2281 GGCCCGGGCA CGAGCAGCCG GGCCCTGAAA CGGCCGCCCG GCTGGAAGTC GACGGCCGCC 
2341 TCCCCGAAGT CGAGCCGGCT CCGGGTGAGC GGGAACCACG CCTTGTAGAC GGCCTCCTTG 
2401 GCGCAGAACA GCAGCCTGTC CCACGGGATG CCGGGCCGGG CCAGTGCCAG AGCGGCCAGC 
2461 GGCCCGCGCT CATCCGCGGT CGTCACCGCC TCGAAGACGC CGTGCGGCAG CGGAAGGGCG 
2521 GGCTCGGCGT CGATGCCCAG AGACGCCACG TCCGCCTCGG ACGCCACCGT CGCGGCGCGG 
2581 TAGCCGGCGC AGTGGGTCAT GCTGCCGACG ACACCGGGCG GCCATCCCGG CGCACCCCGC 
2641 TCACCGGGCA CGAGGGCCAC CGGCCCCCAC CCCAGACCGG CCAGCGCGCG TCGCGCGCAG 
2701 AAGCGCACGG AGGTGAACTC CCTGCGTCGC TTGTCGACCG CGCGCCCGAT CGCCTTCTCC 
2761 TCGTCGGCGA ACAGCCGCGC CTCGGGGTAC CGGTCGGCAT CCAGGACGTC GCCGAAGACG 
2 821 TCCACCGACA CCGCGGCGGG CGGCAGCAGT GCGCAGATCA GCCCGGCCCA CCGCGGCACC 
2 881 GACACGGCGG CGTCGGCCTC GGCACGGTCG TGCACGCACG CCGACGCCGC GGCGTCGGCG 

2 941 AGGGGGCCCG TGCGGCTGCC GTCCCCGGAC CGCCGCGCCG AGGACGCCGC CGGCGGGAAG 

3 001 CCGTCCCTCA CACGGGCACC GTGGCGTCGT CGTGCGTGCG TCCGAGCGCG TTCAGCCGGG 
3061 CCAGCTGGCC ATCGGAGAGA GCGATGCCCG ACGCCGCGAC GTTCTCCCTC AGATGCGCGG 
3121 GCGAGCTGGT GCCGGGGATG GGAATGACCG CCGGCGAGCG GTGCAGCAGC CACGCGAGCG 
3181 CCACCTGACC GGCCGACACC TCCAGCTCGG TGGCGATGTC GGCCACCGGG CTCCCCCCGG 
3241 CGGCGAGCGC ACCGCGGGCG ATCGGCAGCC AGGCGATGAA CGCGATCTCG TGCTTCTCGC 
3301 AGTACTCGAC CACCTCGTCG TTGCGCCGGT CGGTCAGGTT GTACACGTTC TGCACGCTGG 
3361 CCACGGTGAT GTGCTCACGG GCGGCCTCCA CTTCCCGGAC GGTGACCTTG GACAGCCCGA 
3421 TGTGCCGGAT CTTCCCTTCG TCCTGCAGTT CCCCGAGCGC ACCGAACTGC TCGGCCGCCG 
34 81 GCACCTTCGG ATCGATCCGG TGCAGCTGGA AGAGGTCGAT GCGGTCGAGT CGCAGCCTGC 
3541 GCAGGCTCAG CTCGGCCTGC TGGCGGAGGT ACTCCGGTCG GCCGCACGGC ACCCACTGGT 
3601 CGGGACGGGG CCGGCACTGT CCGGCCTTCG TCGCGATCAC GAGGCCGTCG GGGTACGGGC 
3661 GCAGTGCTTC GGCCAGCAGC TCCTCGTTGC TTCCCAGCCC GTAGGAATCG GCCGTGTCGA 
3721 TGAAGGACAC ACCGAGGTCC ACGGCCAGCC GGGCCGTGCT GATCGCAGCC TCCCGGTCCT 
3781 CCGGCGGGCC CCAGTACCCC GGACCGGTCA GCCGGAGGGC GCCGAAACCC AACCGGTGTA 
3841 CCGAGAGGTC GCCTCCCAGA GGGAAGGTGG TCCTGCCGGG CTGTGCCATG CGTTCCTCCT 
3901 GGACGACGTC CGTGCACTCG GGTGGGGCGC GTGGTGACCT CGGTGGGGGC GGGCCGATGC 

3 961 CGACCGTCCG AACACGGTGG ATCCCCCGGT TCAGGGAGCC GTCGTGGAGG GCACCCGCTC 
4021 GTCGGCCCGT CGGGTCACCT CGTGTCCCCG CTCCAGAGTG AGCCGCACGA TGTCGCAGAC 

4 081 CCGACGGATG TCCTCGCGAG AGACGGTGGG GCCGGTGGGC AGGGCGAGGA CCTTCCGCGC 
4141 GAGCCGTTCG GTGTGGGGCA GGGAGACCGG CCGCTCCGAG CGGTAGGGCT CCATCTCGTG 
42 01 GCATCCGGGG GAGAAGTACC GCTGGCACAT GATGTTCTCC GCCCTCAGCA CCTCGTCCAG 
42 61 CAGGTCCCGG TGGACCCCGG TCACCGCGGC GTCGACCTCC ACGACCAGAT AGTGGTAGTT 
4321 GTTGCGCTCG GCACGGTCGA ACTCCATGAC CTTCAGCCCG GCGAGCCCGG CGAGTTCCGA 
4381 CCGGTAGTCG TCGTGGTTGG CCTCGTTCCG GCGAACCGTC TCCTCGAAGG CGTCGAGCGA 
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4441 GGTGAGTCCC ATCGCGGCGG CGGCTTCGGT 
4 501 CACCCGCCCC TGGGTGAATC CGAAGTTGTG 
4561 GTCATCGGTC ACCACCGCGC CGCCCTCGAA 
4621 GAACACCTCG GCTTGTCCGA AGCCGCCGAT 
4681 GTGCGCGGCA TCGTAGAAGA GCCTGAGACC 
4741 GACCGCGCAG GGCCGACCCC ACAGATGGAC 
4 801 CGCCGCCTCG ATCAGATCCG GGTCCAUGCA 
4861 CGGGGTCAGA CCCAGCCAGC GGAACGCCTG 
4 921 GATGACCTCG CCCGACAGAT CGGCGGCGCG 
4 981 GCAGGTCGAC ACGCAGTAGC GCACGCCGGC 
5041 GGCCAGCGGC CCGCCGTTGG TCAGCCACTG 
5101 GAACCGCGCT CGGTCGCCCA CGTTGGGCCG 
5161 CGGGCCGCCG AACACGGCGA GATCACCGAG 
5221 CACGGGACGG TGCACCCGGC AGCAGCCGGC 
5281 GCCGCAGCGC ACGACGGCGT CTCCGTCGTG 
5341 AGTGCTCGCG GCGCCAGTAG ACGCCCGTCA 
54 01 CGTGTTCGCT CCGGTAGTCG TGGACGGCCT 
5461 CGATCACGAA CCCGCCCACG GACAGCTTGG 
5521 ACTCGTACAG GTCGCCGTCC ACCCGCAGCA 
5581 GCGTGTCGGC GAACATGCCG GGCAGGAACC 
5641 CGAAGTTCTC CCTCACCTGC TCCTCGGAAC 
5701 CCATCGCCCG GTCGAGCGGG TGGCTGTCCT 
5761 CGGCGAGCCA CACGGTGCGG TCCTCCACAC 
5821 TGCAGGCGCC GCCGCGCCAC ACGCCGGTCT 
5881 GGACCCTCTC CACGCACCTC TGGATGTTGT 
5941 CGACCGTCGG CACGTCCAGT CCTTCGGCCC 
6001 CGTCCGGGCC CTCCGCCCCC GGACGCATGC 
6061 TCCCGACCGG GAAGGTCGGG TGATCCCCGT 
6121 GGTCCACGTA GAGCTCGCGG GTCTGCACGG 
6181 AGAGAGAATG TGCGGGGCGA CGGCGTCCGC 
6241 ACCTCAGGAA GCCGCGCGCG TCCTCCCAGC 
63 01 GCCGCGCCGC CACCACCTGG TCGAACCCGT 
6361 CTATCAGACG GCCGTCGTCC ACGAACCGCT 
6421 CCACCCGGCC CGCGACGTAC CGGTCCGCTC 
6481 ACAGGTACAC GTCACCCAGC AGGTCCACCT 
6541 GCATGGTTCC GGGCCTGATC CTGAGCAGCT 
6601 ACGCGTAGCC GTAGTCGATG TTCAGCGTGG 
6661 CGAGCAGCCC CTCCTGGAGT TCGGCGCGTT 
6721 CGCTGTAGTC CTCGCGCACG TTCAGGAAGT 
6781 GGTCGGCGAT GAAGTCGACG AGGTCGAGCA 
6841 AGCTGAGCCC GAGCCGGATC GGGCTCTCCC 
6901 GGTTGGCGCG GACCCTTCCG AAGGCGGCGC 
6961 CGTTCAAGCC GTACAGCGAG GTGCGCACCG 
7021 CCAGCGTCCG CTCGGTGAGC GCGAACGAGT 
7081 CCGCCGCACG GCTGCTGAGC GCGCCCAGGC 
7141 AGAAGTACAT GGCGTACGGG TTGCCGGCCG 
7201 CGTTGCCGGA CTCCAGCGCC GACGGGTCGT 
7261 GGCAGCGGAA CATGCAGCTC GGCCCCGGGT 
7321 TGTGCGCGAG CGCCGCGTCG AAGACCCCCC 
7381 AGTACTTGCC CGCCGGCCCG TTCTCCACGG 
7441 GGCCGAGGAG CCGGCCGAAG GAGGCCCGGT 
7501 GGGGCGTGAA GGGGTGGTTC CCGTAGCGCT 
7561 TCGGGCCCTC GTCGACACCG ACGCGGCCGT 
7621 CCGCTGCCGG CAGGTCGGCG CCGGGGCGGG 
7681 AGGCGTGCGA GGTCATCCTT CGTGTCCAAT 
7741 CGGGAAACGC GGCGGGGCGG TCACCGGGAC 
7801 TCGGCCGCGG ACGGGCCCAC GTGTACGGCC 



CATCTTCCCG TTGGTGCCGG TCTCCGTGGA 
CATCGCGCGG ACCCGCTGGG CCAGCCTCTC 
GGCGTTGACC ACCTTGGTGG CGTGGAAGCT 
CGGCTGCCCT TCCGACGTGC AGCCGAGTCC 
GTGATCGGCC GCGACCTTCG CCAGCCGGTC 
CGGCACGACC GCACTCGTAC GGGGGGTGAT 
GTTGGTGGCC GGGTCGATGT CCACGAAGAC 
GGCGGTCGCC GGGAAGGTCA GCGACGGCAT 
GGCCAGCAAC TGGAGGGCGA CGGTCGCGTT 
GAGTTCCGCC ACGCGCTGCT CGAACTCCCG 
GTGGTCCAGC GCCCAGTTCA CACGGTCCAG 
CCCCACGTGA AGTGGTTGCA GGAACGCCGA 
ATTGCGTTTC ATGGTCATAC CTCCCCGGTG 
ACAGGACGTG GACGTACCGG AGGGACAGGG 
CGTACGCGAA AGGGGCGCAC GGAACGGATC 
CGTCCACCGT CTCGATCGGC TCGTCGATGC 
GCTTGCACGC CGGGATCAGG TAGTCGTCCA 
GGTAGAGGTT GACCAGCGCG TCCCTCGTCG 
CCGCGAGCCG GTCGATCGGC GCGGTCGGCA 
TGACCTGCTC GTCCAGCAGG CCGTAGCGGG 
AGCTCAGTAC CCAGTTCAGG TGGTGGAACT 
CGGACGTCAC CGGGACACCG GCGAACGAGT 
CGTGCGCCTT GAGGAGCGCG CGCATCAGGA 
CGATGAGATC ACCGGGAACA TCGTCCTGGA 
CCAGGCGTTG CAGGCCGATC ATGGTGTGCG 
GGTGGTCGGC GTCGAAGGTC CCGCGCTCGA 
GGTCCGCCAG TTCCTTGTCC AGGACGTTCA 
AGATGGTGTT CGACACGACG TTCTTCATGA 
TCGACTTACC TCCGGCTAGA TATCGAACAG 
GGACGCGGTG GGAGCCGTCG CGGTCGGGTC 
CGGCCACGAC ATCGGCCTCC AGCTGGTTGA 
CCATGAAGTA CTCGTCACCG GCGTGCGGCG 
CGACGACCTC CGTCAGGGAG GTGCCCGGCC 
CGGTGAGATC CGGGAACCCC GCCTCCCGGT 
GTACGGACAC CTGGGGATGC GCGGACCGGC 
CGGCGTCCGC GCCGGTCTTC AGGCTGTGCA 
GGGTCCGCCG GCTCACCGCC TCCTCGAACG 
CGGCCTCGAA CAGCCTGCCG TCCTCCCGGC 
CCACCGGCCG GTCCGGCGCG GCCTCGTTCA 
GCCGGTGGAC GCGCCCGGGC AGCACGATGT 
GCTCGGACCG CAGCTGCTGG AAGCGCCGCA 
GCTTGCCCGT GGTCTCCTCG TACTCCTCGT 
CGTGCAGGTC CCACACCCCG GGCTGCCTGT 
TCGTGTACAC GGTCGGCCGC AGCCCGCGAG 
CCGGGTTGGT GAGGGGCTCC AGACCGCCCG 
GGATCTCGTC GATGACGGAC GCGAACATCG 
ACCGCGCACC GGTCACCCGG ACGCAGAAGT 
ACAGGCCCAC GACATAGGGG AAGGCCGGCT 
GTCGCTCCAG CGGCAGCAGG GTGTTCTGCC 
CGTGGCGCAG CTCCGGGACA CGGCCGAAGA 
CCACGTCCAG CATCCGGCGT GCCTCCTCCA 
CGGCCAGCCG CACGAGCCGG CGCGCGACCG 
CGGAGACCAG GGCGCGGTGC ACCGTGTGGA 
CGCACAGGGC GACCGGATCG GCCGCGGTGG 
CGTCTGCGTC CGGAGGGTCG GTCCGTTCCC 
GGCACCGTGA CGGATGTGGT CAGGGACGTC 
CTCCTGCCGG TGCCCGGCTT CCACGTGCCG 
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7861 GCGGCGGTGT CCCAGTAGCG CAGTTGCTGC TCGTCGACGG GGACGGTGAC CCGCCGCTGT 
7921 TCTCCCGGCC GGAGGGTGAC CTTGGTGAAT CCGGCGAGTT TGCGCAGCGC CTGGGGGGCC 
7981 TGGGTGTCCG GACTTGCCCC GAGGTAGACC TGGACGGTCT CCCTGCCCGC CCGGTCACCG 
8041 GTGTTGCGGA CGGTGACCGT GGCCTCCAGC CCCCGGGCGG TGCGCGCCAC GGACAGGTCG 
8101 CTGTAGGCGA AGGTGGTGTA CGACAGGCCG TACCCGAACG GGAAGAGCGG GGCCACGCCG 
8161 GTCCGGTCGT ACCACCGGTA GCCGACATCC AGTCCCTCGG AATAGGTCTC CTTGCCGTCG 
8221 ACACCCGGGT AGCGGTGCGG GTCACCGGCC ATGGGATGGC CGGTCTCGCT CCCCGGGAAG 
8281 GTCTGGGTCA GCCGGCCGCT GGGATTCGCG TCGCCGAACA GCAGGGCAGC GGTGGCCTCC 
8341 GCGCCCTCCT GCCCCGGGTA CCACATCTCC AGTACGGCGG CCGTCTCACG CAGCCAGGGC 
8401 ATGAGCACCG AGGAACCGGT GTTGAGGACG ACGATCGTGC GGGGGTTGAC CTTGGTGATC 
8461 GCCGCGATCA GCTCGTCCTG TCGGCCGGGC AGGGACAGCG AGGACCGGTC CACGCCCTCG 
8521 GCACTGTCGT CGTGGGCGAA GACGACGGCG GTGCGGGCCT CGGCCGCCGC CGCGACGGCG 
8581 CGGTCGAAGT CGTCCCGGGC AGCCTCGGGA GTGACCCAGC TCAGCTCCAC CGACAGCGGG 
8641 GTCTGTTCGA AGGCCCAGCC GTTCATGGTC ACCGCGTGGG TGCCGCGGGT CAGCCGCATC 
8701 GTGGTGGTGA CGGTGCCGAA CGCCTCGGTG CCGAGGACAG CGGACTGCCC CGCGATCTGC 
8761 AGGTTGGCGA CCCCGCCGAC GGCGGTGAAG GCGATCTTGT AGTCGCCGTC GGCGGGGACG 
8821 GTGAGCCGAC CCTCGTAGAA GACGCCCTGG CCGCTGGGAT CCAGTTGCTT CCCGCTCGTG 
8881 AAGGCGGGGG TGAGCGAGCT TTCGGGCACG GGCACGCCGA CCAGGTCCTC ACCGGGCTCG 
8941 TGGACGACCC GGGCCTTCTC GCCGGCGCGC CGTGCGAGGG CGTCGACGGG TGCGGTCGCC 
9001 CGGTCGGGTA TGACGTGCGC GCTGCCGTTG CCGGTGACCT TGGGGTGCTT GGCGCTGTTG 
9061 CCGATGACGG CGATGTCCGT GGCGTCCTCA CCGGTCAGGG GCAGCGTCCG GCGCTCGTTG 
9121 CGCAGCAGCA CCCCGCCGTT CTCGGCGATG GTGCGGGCCG CCGCGCGGCC GGCCTCGGGA 
9181 TCACGCTGCG GGCGCTCGGT CGCCTTGCCG TCGAGCAGGC CGAAGCGCTC CATCTGGCCG 
9241 .AGGATGCGGA CGACGGACCG ATCGAGCGTC GCCTCGGGGA CGCTGCCGTC CCGCACGGCG 
9301 GCCCTGAGCG CGGAGGAGAA GTACTTCGAT TCCGGCACCG GCTGCCCCAG GGTCAGCTCG 
9361 ACGCCCAGTT CCTGGTCGAG GCCCCTGGTG ATGTCGCCGG TGGCGTGGGT CGCCAGCCAG 
9421 TCGGACACCA CCCAGCCGCG GAAGTCCCAC TGTTCGCGCA GCACCTCCTG CAACAGGTGC 
9481 TCGTTCCCGC ATGCGTGCGC GCCGTTGACC TTGTTGTACG AGCACATCAC CGAGGCGGCG 
9541 CCCGCCCGCA CGGCGCTGCG GAAGGCCGGC AGCTCCACCT CCTGGAGCGT CTGCTCGTCG 
9601 ACCACGGCGT CGATGGTCTC CCGCTGGTAC TCCTGGTTGT TCGCGGCGAA GTGCTTGACC 
9661 GTGGCCATCA GGCCCTGGTT CTGGATGCCG CGGACCTCGT GGGCGGCCAT CCGGGACGAC 
9721 AGCAGAGGAT CCTCGCTGTA GGTCTCGAAG TTCCGCCCCG CGTGCGGCAC CCGGATGACG 
9781 TTGGTCATGG GACCGAGGAC GATGTCCTGT CCGAGGGCCC GTCCCTCCCG GCCCAGGACG 
9841 GTGCCGTACT CCTCGGCGAG GCGTTCGTCG AAGGTGGCGG CGAGCGCCAC GGGGGTGGGC 
9901 ATGGCGGTGG CGGTACCGCC GAGCAGACGG ATGCCGACCG GGCCGTCGGC GGTGCGCAGC 
9961 TCGGGGATGC CGAGCCGGGG CACGCCCGGC AGATAGCCGA TGCCGGTCAT GGTCGGCCCG 
10021 GCCACGGGCC CGGTGGTCCA GTGGACGAAC GAGATCTTCT CGTCGAGGGT CATCTTCGCC 
10081 ACCAGCCCGC GCACCCGCGC CGTCCCGGGC TCGGCGGCCG GAGCACCGGC GGCGGAGGGG 
10141 GCGGTGAGCA GACCGCCGAC GCACAGGGCC GCGAGCGCGG ATCCGACGGT GCGCCGGATG 
10201 CGGCGACCCG TCCTCGTCGT GCCGAACAGC ATGCTGACGG ACGTCCTTTC TGCCGAGGTT 
10261 GCCGTCCTCA TCGGGGCGGG AACGTTTCTG TCCGTGCCGC CATGATCACC AGCCCACGAG 
10321 CATCGTGCGC GGGCCCCGCA CCACCATCTC CGTCTTCCAC TCGACGGGCT GGGCGAGGTG 
103 81 CAGCCCGGGG AGTTCGTCGA GCAGGGTCCG CAGCGCCTCC TGGAGCTCCA GCCGCGCGAG 
10441 GGACGCGCCG AGGCAGTGAT GCGGACCGTG TCCGTACCCC AGGTGCCGGC CGGCGTCGTC 
10501 GCGGGTGATG TCGAGCACGC CGGGCGAGCG GAAGCGCAGG GCGTCCCGGT TGGCGGCGTT 
10561 CATCTGGACC AGCACCGGAT CCCCCGCCCG CACCAGCGTC CCGCCCACCT CGACGTCCTC 
10621 GGTGGCGTAG CGCGGGAATC CCGCCTGGCT GCCGAGCGGA ACGAACCGCG TCAGCTCTTC 
10681 CACCGCGTTG TCCAGCAGGT CAGGACGGTC CCGGAGGAGG GCCAGCTGGT CCGGCTGTTC 
10741 GAGCAGGACC AGGACGAAGT TGCTGATCTG GCTGGCCGTG GTCTCGTGCC CGGCGAACAG 
10801 GAGGAAGACG ATCAGGTCGA CCAACTCCTC CTGCGACAGC CGGCCCTGGG CGTCACGGGC 
10861 CTCGACGAGC GCCGAGACGA GATCGTCTCG CGGCGCGGCG CGGCGGGCCG TGATCAGGTC 
10921 CGCCAGATAG CCGGTCAGCT CGCCGGCCAG CCGCACATGG TCCTCGGCGG TGAGCGAGCT 
10 981 GGTGGACATC GCTATCTCGC ACCAGCCGCG GAACAGGTCA CGGTCCTCCT CGGGCACGCC 
11041 CAGCAGGCCG CAGATGACCG CCACGGGAAG GGGGACGGCG TAGTGGTCCA CGAGGTCGAC 
11101 GGGCGATCCG AGCGCCGTCA TGTCCCGGAG CAGCGACGCC GTCATCCTTC GGACGTGCGG 
11161 CCGCATGGCC TCCACGCGGC GCGGGGTGAA CGCCTTGGTG ACCAGCCCGC GCAGCCTGGT 
11221 GTGGTCCGGC GCGTCCATGC CGATGATGCC GTTGGGCGTC CGCAGCTCCC GCATGCGCGG 
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112 81 CGCGTCGTTC TCCGCCTCCG CGGCGTGGCT GAACCGCCGG TCGCCGAGGA CGAAACGCGC 
11341 GTCGTCGTAA CGGGAGACGA GCCAGCCCGG CTCCCCGTAC GGCATGTGGA CCCAGAGGAG 
114 01 TCCTTCGCTC TTCCGCGCCT GCTCGTACGC CTCGTCCAGG TCCAGTCCCT CCGGGCGGCC 
114 61 GAAGGGATAG GAGACGGGTG CGGTCTTCGT GCTGGTCAAG GGGAGCCCCA TATCGCGTCG 
11521 TTCGGGTGCT TAATCGTCGT GGGCCGGATC ATGCGGCCGA GCCGCCCCAG GTGACGGGGA 
11581 CCGCCTCGGG GGTGCGCAGG AGGGAGCCGG CCCGCCAGCG AAGCTCACTC GCGGGGACGG 
11641 ACAGCCGGAG CTCCGGUAUG CGCTCGACCA GGGTGCTCAG CACGACCTGC AGTTCGAGCC 
11701 GGGCCAGGGG GGCGCCCATG CAGTAGTGGG CGCCGTGACC GAAGCCGAGG TGCGGGTTCT 
11761 GGGCACGGGC CAGGTCGAGG GAGGCGGCAC CGGGAAACGC CGTGTCGTCG AGGTTCGCGG 
11821 ACGCGATCGA GGTGATGACC GCCTCGCCCG CGCGGATGAG CGTGCCGCCG ATCTCGACGT 
11881 CCTCCAAGGC GATGCGGGCG AACCCGTCCA GGGTGCTCAG CGGGGTGAAG CGCAGCAGCT 
11941 CCTCGACGGC GGAGGGAACC AGGTCAGGTT CGGTCACCAG GCGCTCGTAG AGGCGGCGGT 
12001 CGTCGAGCAG GAGGAGGACG AAATTGCCGA TCTGGGTGGC GGTCGTCTCG TATCCGCTGA 
12061 TCAGCAGGCC GGAGCCCATC ATGAGGAGCT CGGCCTCGGA GAGCCGGTCG CCTTCGTCAC 
12121 GGGCCTGGAC CATGGTGCTG AGGAAGTCGT CGGTGGGCCG GGCGCGGCGA TCGGCCACGA 
12181 GGTCGGCCAT GTAGGCGGAG AAGTCCTCGG TCGCCCGCTG CACTTCGGCG GGCCCCAGGG 
12241 AGGAGGACAC CAGCGCCTCG GAGAAGTCCC GGAAGAGGTG GCGGTCGTCG TAGGAGACGC 
12301 CCAGCAGTCC GCAGATCACG GAGATCGACA CGGGCAGCGC CAGGTCTTCC ATGACCTCGC 
12361 CGGACGGCTC GTCGCGCGCG ACCATGCCGT CGAGCAGTTC ATCGGTGAAC CGTGAGACCC 
12421 GCGGGCGCAG TTCCTCCACC CGGCGGGCGG TGAACGCCTT CCCGGCGAGC CGCCGGAGCC 
124 81 TGGTGTGTCC GGGCGGGTCC ATGGAGATGA GACCGCCGGC GGGCAGCGGG TACTCGGTGG 
12541 GCCGGGGCAC GTCACGTCCG GCCCCGGCCT CCATGCTGAA CCGCGGGTCT CCCAGGACCG 
12601 CCCTCACGTC GGCGTGTCGG GTCAGCAGCC AGGCCTCACC GCCGTACGGC ATGCGGATCC 
12661 GGGACACGGG CTCGTGCTCC CGCAGACGGC TGTAGAGCGG ATCCAGCGCG ATGCCCTCGG 
12721 GCGGGGAGAA GGGGTACGGG CGAGTCCGGT GGACCGGACA GCTAGACGTC ATGGACACTC 
12781 TCTCGGTCGT ACGGGGCGGC GGCAGAGCTC GAAGGGTGCG GGCGGCCGAC GGCGAGCCCG 
12 841 GTCGCCGTCC GGATCGTCCG GCACACCGCT TCGGCCTGCT CGGTGAGGTA GAAGTGGCCT 
12 901 CCGGGGAAGG AACGGAACGC CGTCTCCCCG GTGGTCAGCC CGTGCCAGGC CAGGGCTCCG 

12 961 TCGGTGGGGA CGTGCGGATC CGCGCTTCCC GTCAGCACGG TGATCGGGCA GGCGAGCGGA 

13 021 GCTCCCGGAC GCCACGTGTA GGTGGCGGCG GCCCGGTAGT CGTTGCGGAT CGCCGGCAGG 
13 081 GCCATCCGCA GGACCTCCTC GTCGGCCAGC ACCCGGGGGT CGGTGCCTTG CAGTTCGGCC 
13141 AGCTTGGCGA CCAGCCGGTC GTCGCTCAGC AGGTGCGCGG TGGTCCGGTG GGCGACGGTC 
132 01 GGGGCGACGC GGCCGGAGAC GAACAGATGC GCCAGCCTGC TCTCCGGCGC CTGTTCCGCG 
13261 AGCCGGCGGG CGGTCTCGAA GGCCACCAGC GACCCCAGGC TGTGACCGAA GAGCGCCACC 
13321 GGCCGGTCCA GCCAGGGGCC GAGCGCTCCG GCGACGTGCG TCACCAGAGC GTCGACGGAG 
13381 TCCACGAACG GCTCCAGCCG CCGGTCCTGC CGCCCGGGAT ACTGGACGGC GAGCACCTCG 
13441 ACCCGGTCCG GCAGCAGACA GGTGAAGGGG AGGAAGGCGC TGGCGCTGCC GCCGGCGTGC 
13501 GGAAGGCACA CGAGCCGGAG CGCGGGGTCG GCGACGGGCC GGTAGCGGCG CAGCCAGAGG 
13561 TCGCCGGCGT TGGTGGAGCC GGGCGCCGTC GTGGCGGGCC CCTCGTTCAT GCGTGATCCG 
13621 CTTCCGGACG TCACGGTGTC AGCGCCTTCC ACCACGTGCC GTTGTCCCGG TACCAGTCGA 
13681 CGGTCTCCGC GAGCCCGCGA TCGAAGGCGA CGCGGGGAGC CCATCCCAGC TCCTCGGTGA 
13741 TCTTCCGGGT GTCCACGCAG TAGCGACGGT CGTGCGCGGC CCGGTCCGGC ACCCGGTCCA 
13801 CCACGGACCA GTCGGCACCG AAGACGCCGA GAAGACGACG GGTGAGGTCG ACGTTGGACA 
13861 GTTCCGTCCC GCCGCCGATG TTGTACACCT CTCCGGCCCG CCCCCGGGCG GCGACCATGG 

13 921 CGATGCCCCG GCAGTGGTCG TCCACGTGCA GCCAGTCGCG CCGGTGGAGG CCGTCCCCGT 
13981 ACAGGGGAAG GCGCAGGCCC TCCAGCAGAT GGGTCACGAA GAGCGGGACG ACCTTCTCGG 

14 041 GGTGCTGGTG GGGGCCGTAG TTGTTGGAGC AGCGCGTCAC CCGGACGTCG AGGCCGTGGG 
14101 TACGGTGGAA GGCGAGCGCC AGGAGGTCGG AGGAGGCCTT GGACGCCGCG TAGGGGGAGT 
14161 TCGGGTCCAG CGGATCGGCC TCGGTGGAGG AGCCCTCGGG GATCGATCCG TACACCTCGT 
14221 CGGTGGAGAC GTGGACGAAG CGGCTGACGC CCGCTTCGAG CGCCGACCGC AGCAGGGTCT 
14281 GCGTCCCCAG CACGTTGGTG GTGACGAAGG CGGCGGAGTC CGTGATCGAG CGGTCCACGT 
14341 GCGACTCGGC GGCGAAGTGC ACCACCAGGT CGGTGCCGCG CAGGGCCTGG GCCACCGTGG 
144 01 GCGGGTCGCA GATGTCGCCG TGCAGGAAGG TGTGACGGGG GTGGCCGGCG ACCTCGGCGA 
14461 GGTTCTCGGT GTTCCCGGCG TACGTGAGCT TGTCCAGCGA GAGGACGTGG GCGCCGGCCA 
14521 GCTCCGGATA GGAGCCCGAC AGGAGCTGTC TGACGAAGTG CGAGCCGATG AAGCCCGCTG 
14581 CCCCGGTCAC CAGGACGCGC ATCCTTGATC CGCCTTTCTG CCGTGCTGTG TGGGGGAGGT 
14641 GTGGTCAACC GGTGCGCGTC GCCCCGAGTC CCGAGGCGAC CTCCATGAGG TAGTCCCCGT 
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14701 AGCTGGAGGC GCTCAGCTCC TCACCCAGCC 
14761 TGCGCAGCGC GATCTCCTCG AGACAGGCGA 
14821 GGACGTACTG ACCCGCCTGG AGCAGGGAGT 
14 881 CACGTCCGAG CTGGGTCAGC CGGGCGCGGC 
14 941 TGATCTCCAA CTCGCCTCGG GCTGAGGGCT 
15001 TGTCGTAGAA GTACAGCCCG GTGACGGCGA 
15061 CCAGGGACAA CAGGTGCCCC TGCTCGTCGA 
15121 TGGACGGATA GCCGAACAGT TCACATCCGT 
15181 TGGAGAATCC GGGGCCGTGG AAGACGTTGT 
15241 CGATGTGCTT GTCGCCGACC AGGAACGCGT 
15301 CGTAGTCGAG CTGGATACCG AGGCGCGAGC 
15361 CGTGCTGGTG CGAGGAGATG ATGAGGATGT 
15421 GCGGATAATA GATCATGGGT TTGTCATAAA 
15481 TCAGCGGCTG CAGTCGAGTG GCGCTTCCCC 
15541 GTTTCAGCGG TGTCTCGAAC ACGGTTGGTC 
15601 GAAGACACTG TCCTGAGAGG CCGGCGGACC 
15661 TGCATTCACC CCGCCCCGGG ACCGTCATCC 
15721 GACGTCCAGG AGTTCCTTTC GGGCCGCGGA 
15781 GATTCCAGGG CGGTGGCAGG GAAGGGAACC 
15841 GCAAGCGGCG GGCCGTGCTT CGGACGGCCT 
15901 CCTGATAGGT CGGCAGCAGA CCGAGCCGTT 
15961 CCCTGTCGGA GAGCAGCGGC TCGATGTCGT 
16021 GCGGATTGAC GGAGTGCTCG CGTGCGGGAG 
16081 TGGCGTCGTC GGTGAGGGAG AGGAAGGCAC 
16141 TGTTGCGTTC CGCGTCCATG GGCACGATCT 
16201 GGACGTCCAC GACGACGTCG AGGCCGGCTC 
16261 CTGGCGGGAT CTCGGTGTAG TGGATCCCGC 
16321 TGACCTGGGC CACCGGGAAG TCATGGCCGA 
16381 ACTCGTGGGA CCTCCCGCGG TGGTCGGAGT 
16441 CGATGCTGAG TGGATGCATC GCGCCTTCTC 
16501 GCCGGGGGCA CGGCCGAGCC GGTCAGCTGG 
16561 TCTGGTGGAT CTCGTCCGTG ATGCCGTGCT 
16621 ACGCCGGAAT GCAGTAGTCG TCGACGATGA 
16681 GGTTGGTGAG AACCTCCCTG GTGGCTGCGT 
16741 GCCTCTCGAT GGGCGCGGTG GGCATCGTGT 
16801 GGTCGTCCAG GAGTCCGTAG CGCGCGAAGT 
16861 TGCTGAGGAC GTCGTTGTAC TGGCCGAGGT 
16921 CGGTGGTCTT GGGGAAACCC TGGAAGGAGT 
16981 CGTGCGCCCG GAAGACGCCG CGGGCGAAGA 
17041 CCGCGAAGTC CCCCGGCACG CCGTCGCGCA 
17101 CGAGCCGCTT GAGACCGACC ATCGAGTGCG 
17161 GCAGTTCCGC GGAATACGAG CTGCTGGTGA 
17221 AAATCATGTT CGTCACGACC TTCTTCAGAA 
17281 TGACAGTCAT TTTCCTCACT TACGGGTAGC 
17341 GGGCCGCCGG GGCTGAATTC CCTGTGTCCA 
17401 AGCCATCTAA CCCCAGTGAT CAGATTCGGG 
17461 CGATCCTGTC CGGAAGCGAG GGGCATATGG 
17521 TCCTTGCTGC GGCGCTGACC GTGTGCTTGC 
17581 GCATTCAGGG TGAGGAAATC CATGAAATCT 
17641 GCGGGCGTCG CATGGCCCGT CGCGCGAACC 
17701 GAGAAGCGCA AGAACGAGCC CATATGCCGG 
17761 CTCATCACCA AGCACGAGCA CGTACGAGCC 
17821 CCGGCCAAGC TGCCCAGGCT CTCCCCCTCC 
17881 CTGACCATGG ACCCCCCGGA CCACAACGCC 
17941 GTGCACCGGG TCCGGGAGAT GCGCCCGGGC 
18001 GGGATCCTCG AACGGCGGGG GCCGGTGGAC 
18061 ACCCTGGTGA TCTGTCAGCT CCTCGGAGTG 



GGTAACAGGT GTCCGCGTCG ATGAAGCCCA 
TGCGTACGCC CTGGCGCTGC TCGAGGAGCT 
CGTGGGTTCC CATGTCGAGC CAGGCGAATC 
CCTGCTCCAG ATAGGACAGG TTGATGTCGG 
TGAGGTCCTT CGCCAGCTCC ACCACGTCGT 
GGTTGGAGCG AGGACGGCTC GGCTTCTCCT 
iCTCGGCGAC GCCGTAGCGC TCGGGCGACT 
CGAGACGTCG CAGGGAGTGC TTCAGGACGG 
CGCCCAGAAT GAGCGCCACC CGGTCGTTCC 
CGGCGACACC GCGCGGCTCG TCCTGGACCG 
CGTCGCCGAG CATGACCTGG AACGTCTCCA 
CCTGGATTCC CGCCAGCATC AGCACCGACA 
CGGGCAACTG CTGTTTGGAA AGCGCACCGG 
CGGCGAGGAT TATCCCCCTC ACTCCGGGGC 
CTCCGTGGTC ACATGGCCGA TATGGGGGGT 
GGCTGTCGCC TCGCGGACAC AGCGGCTTAA 
GAGAAGAAGG AATGCGGTGT CGTGGGAACC 
ACGGCGGCGC GGAGATTCTG AACCGCGGGG 
ACCGCCGCGC CATCTCTCCC GGAACGTTCC 
ATCTCTGCGC CTGTTGCTGT TCCTGCCAGG 
CGGCCGTCGC GAGGGTCGGG GCGTTCTCGT 
CCGGCCACGC GATTCCGAGG TCGGGGTCGA 
CGTATCCGGA GGAGCAGAGG TAGACGAGCG 
GGCCCAGTCC CGCGGTCAGA TAGACGGCGG 
CCCAGCGCCC GAAGGTCGGC GAACCGATGC 
CCCGCACGCA CACGCTGTAC TTGGCCTGAC 
GCAGCGCGCC GCGGTGCGAC ACCGCGACAT 
ACGCCTGACG GAAGCTCTCG CCCCGGAACC 
GGATGACGGG TTCCTGCGAC CAGGCCCCCT 
CTTCGGACCG ATGGGTGGGG TGCGGGGCGG 
AGCGTCTCCA GTAGGAACCC TGGCGATCGA 
CGTCCCGGAA CTCGTGGACG GCCTGCCGGC 
CGTACCCGCC GTCCGACACC TTGTGGTAGA 
ACGAGTCCCC GTCGAGCCTC AGCACCGCGA 
CCTTGAACCA GCCCGGGAGG AAGCGGACCT 
TCCCCTTCAC GGTCTCCACG TCGACCGGGA 
CGATGTCGAC GTCCAGCTGG TGGTCGTCCT 
CCGCGACCCA CACCTTCCGG TCCCGCACGC 
TGCAGGCCCC GCCCCGCCAG ACCCCGGTCT 
GCACGTCCTC CAGGCACTTC TGCAGGTTGT 
CCACGCGCGG AAAGTCCTCA CCCACCGAGC 
TGAGGCCGGC GACATTGGTC TGGTCCTCGT 
GGTCGAGATA CAGGTCCGCT TCGGCAGCTA 
AGTGCCCAGC GGGCGGCTCG TTCAGGACGG 
CACAGATGAG GTGGATGAGG TGGATGAGGT 
CAAGGGTCGA AAACGAGCCA CGTCTTATGT 
TGCAGTGGCG ACTGCGGCCG ATCTGGCTGA 
ATGTTGGACG TAGATCACCT TCTCCCGATT 
TCAAAAGTCG TTCACAGCCG TCCTGCGGAA 
TGCCCCTTTA CGCTCCCTGA TCAGTACGCA 
GCTCAGGTCT GGGACGACTC CAGAACCTGG 
CTTCTCGCCG ACCCCCGGGT CACCGTCGAC 
GACGGCGACG GCGGCGGCTT CCGGTCCCTG 
CTCCGCCGCA TGCTCATATC CGAGTTCAGC 
ATCGAGCGCA CCGTGCACGG GCTGCTGGAC 
CTGGTGGCCG AACTCGCGCT GCCGATGTCC 
CCCTACGAAG ACCGCGAGTT CTTCCAGGAA 
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18121 CGCAGCGAAC AGGCCACCCG GGTGGGCGGG 
18181 CTACGGGACT ACCTGGACCG GTTGGTCACC 
18241 CTGTGCCGGC TCATCGCCAG TCGACTGCAC 
18301 GACAACGCCG TGCTGCTGCT CGCCGCCGGC 
18361 GGCATCCTGA CACTGTTGCG GCACCCCGGC 
18421 CTGATGCCGC AGACGGTCGA CGAACTCCTG 
18481 CGGGCGGTCA CGGAGGACAT CGAACTCGGC 
18541 ATCGTCTCGC TGGCCTCCGC CAACCGCGAC 
18601 GACCCGCACC ATCCGGCGAG CCGGCACGTC 
18661 GGCCAGAACC TGGCCCGGCT GGAGCTGGAG 
18721 CCCACGCTCA GGCTGGCCGG CGACGCCGAC 
18781 TTCGGGCTGC ACGAGCTGCC GGTCGAGTGG 
18841 CAGTCGACCA GAGCCGGTGC CTGGGAGCCG 
18901 TCCGCCAGGA CGAGGAAGGA CTGAGCCGGG 
18961 GGCCGCGGGT GCTCCAGACG GTGGACCTCT 
19021 GCCCCGGTCC CGCGCCGCAG GACACCAAGT 
19081 GCTCGTCACG GGAGCACTCG GGTTCATCGG 
19141 CGGAGCCGAG GTGCTCGCCC TCTACCGCAC 
19201 CGCGCTCGAC CGAGTACGCC TGATCAGGAC 
19261 GGCCTTCAAG TACCTGGCAC CCTCCATCGA 
19321 CAACGCGCAG TTCAAGCTGG AGCGCTCGGC 
19381 CTCCCACCTG CTGAACTGCG TACGGGACTT 
19441 CTCCGAGCTG TACTGCGCGC CGCCCACCGC 
19501 ATCCATGCGG TACACGGACA ACGGCTACGT 
19561 CAGGCTCCAC CGCGAGCAGT TCGGCACCAA 
19621 CGGGCCGGGA GACGGCTACG ACCCCTCCCG 
19681 GGCCGACGCC GGCGAGGAGA TAGAGATCTG 
19741 CCACGTCACC GACCTGGTAC GGGCCTCACT 
19801 GATGAACGTG GCCGGCGCGG AACAGGTCTC 
19861 CGTCCTGGGA CGGCCCGAGC GCATCCGCCT 
19921 CAGACTTCTG GATCTGACCA GGATGTCGGA 
19981 GACCGGGCTG GAAGAGACCG CTCGCTGGTT 
20041 TACCCCCCTG GAAGGTAACT CGTGGTCACA 
20101 ATCCGCGCCT CCGGCGGGCA TGACGCCGAC 
20161 GCCGACATCG TCCGCGTACT CCTGGACGAG 
20221 GACGTCCCCG TCCTCGTCGA GCTGGCCGTC 
20281 CTGTACGTCC TCAAGGGCGG CCCGGTGCGC 
2 0341 ATGCGCGTCG AGTACGAGCT GGGCGAACTG 
20401 AACGTCACCG GCGTCCGCGG AACGACTCTC 
20461 GGCGAGGAGG ACTCGGGTGC CGAGCACATC 
20521 TCCCAGACCG TGCTCGCCGG CTGCCATTCC 
20581 CGCTACCTCA CCCCGAAGTG GGGCTCGCTG 
2 0641 TTCAGGAGCT ACCGGGACCA GCCCGTACGC 
20701 CACCCCGAGT GGGGCGGCGG CTCCCTGCGC 
2 0761 ATCTACGGCC TGGACATCGT CGACAAGTCG 
2 0821 CGCGGCGACC AGAGCGACCC CGACCACCTG 
2 0881 GACATCGTCA TCGACGACGG AAGCCACATC 
20941 CTGTTCCCGC ATGTGCGGCC CGGCGGCCTC 
21001 TGGTCCGGCT TCGGCGGCGA CGAGGACCCG 
21061 CTCAAGTCCC TCGTCGACTC GATCCAGCAC 
21121 CCCAGTTACG CGGACCAGCA CGTGGTCGGC 
21181 GAGAAGGGCA CCAACGCCGA GGGCGGCATC 
21241 CTCGTCGCCG TCTCCTCCGG GGGCCACGCA 
213 01 ACACCCGACC GGACCGCAGG AGGCCCGCAT 

213 61 AGGCGACGTC CAGCCGTTCG TCGCCCTCGG 
21421 CACCCTGGCC GCCCCCGCCA CGCTGCGGCC 

214 81 GCTGTCCCCC GGGGATCCCG ACGGATTCTT 



AGCCAGGAAT CGCTGACCGC GCTCCTGGAA 
GCGAAGATCG AGACGCCGGG TGACGACCTG 
ACCGGTGAGA TGCGACACGC CGAGATCGTG 
CACGAGACCA GTGCCGCCAT GGTGGCACTG 
GCCCTGGCGG AGTTGCGGGG CGACGGTACG 
CGTTTCCACT CCATCGCGGA CGGCCTTCGA 
GGCATCACGC TGCGCGCCCG AGACGGCCTC 
GAGAGCGCCT TCGCCTCCCC GGACGGCTTC 
GCCTTCGGCT ACGGCCCCCA CCAGTGCCTG 
GTCACCCTGG GCGCGGTGGT GGAGCGCATT 
GCACTGCGCG TCAAACAGGA TTCGACCATC 
TGACGGAAGG AGGACACAGC GTGCGGGTGA 
GCCAGTGCGA GCAGCTGGCG CCGGAGGTCT 
TCCTCGTCCC CGAGCCCGAT CCGGCGTCAT 
GCCCCGTACA GGCCGTCCTC ATCGACGAGG 
GACCGCTGAC CGCTGGGCCG GCCGCACGGT 
CTCCCACTTC GTCCGACAGC TGGAGGCGCG 
CGAACGGCCG CAATTACAGG CCGAGTTGGC 
GGAGCTGCGG GACGAGTCGG ACGTGCGAGG 
CACCGTCGTC CACTGCGCGG CCATGGACGG 
CGAGATCCTC GACAGCAACC AGCGGACCAT 
CGGCGTCGGC GAGGCCGTGG TCATGAGCTC 
GGCGGCCCAC GAGGACGACG ACTTCCGCCG 
CCTGTCCAAG ACCTACGGCG AGATCCTGGC 
CGTCTTCCTG GTGCGACCGG GCAACGTCTA 
GGGCCGGGTG ATCCCCAGCA TGCTGGCCAA 
GGGGGACGGC AGTCAGACCC GGTCCTTCAT 
GCGCCTGCTG GAGACCGGCA AGTACCCCGA 
CATCCTGGAG CTCGCCCGGA TGGTGATGGC 
CGACCCCGGC CGCCCCGTCG GCGCCCCGAG 
AGTGATCGAC TTCGAGCCCC AGCCCCTGCG 
CCGCCACCAC ACGCGCTGAA CCTCCTCTCA 
CACGCCCCGA ACTCGCTGAT CAGTGACATA 
CTCAAGGACC TGGCCGCCCG ACACGATCCG 
ATCACCTCAC GCTGCCCCGC TCCCGTGAAC 
CGGGCGGGAG ACCGCCTCTT CCCCACCTAT 
CTCGCGGCCA AGGACGAGGC GTTCGTCGCC 
GCCCGCGAGC TGTTCGGACC GGTGCGGGAG 
TTCCCCTACG TCGGGGACAC GGCGTCGGAA 
GGCACGCACT TCCTGGCCGC GCAGCAGGGC 
CACAAGCCGG ACCTCAGCGA GCTCTCCTCG 
CACTGGTTCA CCCCCCACTA CGACCGCCAC 
GTTCTGGAGA TCGGCATCGG CGGCTACAAG 
ATGTGGAAGC ACTTCTTCCA TCGCGGCGAG 
CACTTCGACG CGCCGCGCAT CACGACCCTG 
CGGTCGATCG CCGAGAAGTA CGGACCGTTC 
AACGACCACA TCCGGACCTC GTTCCAGGCA 
TACGTGATCG AGGACCTGTG GACCGCCTAC 
AAGCGGTACA GCGGGACGAG CCTGGGCCTG 
GAGGAACTGC CGGAGGCCGG CGACCACCGT 
ATGCACCTCT ACCACAACCT GGCGTTCATC 
CCCCCGTGGA TCCCACGCGA CTTCGAGACC 
TGAGGAGCCG TCGGCACCAG CCACCCGAAC 
GCGCGTGACC CTGCTGAGCG TCGGATCCCG 
CATCGGCCTC AAGGCCCGCG GCCACGACGT 
ACTGGTCGAG CGCGCGGGAC TGACGTACAG 
CACCATGCCC GAGGTCGTCG AAGCGCTGCG 
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21541 GCGCGGCCCC TCGTTCAAGA ACATGCTCGC 

21601 ACAGCAGGTC GTCGACGCGA TCCACGACGC 

21661 GCCCCTCACC CTGGCCGCCG CGTACGGGCA 

21721 GTGGCCCAAC AGCATGACCT CGGCCTTCCC 

21781 ACCGCTGACG TCCCTGTACA ACCGCTACAC 

21841 GTGGCGACGC CCCGAGATCG ACGGCTACCG 

21901 CGAGTCCL'L'G TTCCTCCGAC TGGGGCACGA 

21961 CGTGCTGCCC AAGCCGCGGG ACTGGCCGCG 

22 021 GGACCAGCAC GGGGAGCCGC CCGCCGAGCT 
22081 GGTGGCGCTC ACCTTCGGCA GCACCTGGTC 
22141 CGCCCTCGAC GCCGTCCGTG GCGTCGGACG 
22201 CGACCTGCCC GACGACGTGC TCCGCCTGCA 
22261 GATGGCCGCG GTGATCCACC ACGGCGGCGC 
22321 AGTGCCCCAG GTCATCGTGC CGGTCTTCGC 
22381 CAGAACAGGC GTCGCCGCCC GGCCGGTCCC 
22441 GCAGAGCGTC CGCCAGGCGG TCACCGATCC 
22501 CGAACGGGTC TCCAAGGAAC GGGGAGTGGA 
22561 GGAGACGGCA CGCGCCACGG CCTGACACGG 
22621 CCCGCCGGCC GACGGGTCCC GGGACCGCGC 
22681 CGCACGGAGA GCGTGACCCG AGTCGGCGCC 
22741 TGCGCGTCAC GGTGCCAGGC ATTGGGCCCC 
22801 TCGAGAGCCA TGGTCCGGGT CAACTCCTCC 
22861 TGCTCCGCGA GACGCGACTC CTTGCCTTCG 
22921 ACCTCCCGCA GGTGATCGGG CAGAGGAGTG 
22981 ACGCGATGCA GTTCAGGACC GTTGCGGGGA 
23041 GCATCCCGCA GCGGAAGCGT CTGCCAGGCG 
23101 GCCTTCGCGG CACGCCGCAC GGCGTACTTG 
2 3161 GGGAACGCTT CCATGACCCC GGCGTGATAG 
23221 CAGCCGGGCA CGGCCGGATC GGCCGTCCGC 
232 81 GGGTCGTAGT GACCCGCCGA CAGGAATGCG 
23341 GCCCGCAGCT TCGTCGCCCT GGGAAGCAGA 
23401 GAGTGTC CGG CGGGGCAGAA AAGTGCGCGG 
23461 GGGCAGGCGA GGTAGCGCAC GATCCTGTTG 
23521 GGCAGACCGG CCAACCCTAA CGCAGGCGCC 
23581 GAACCACCGT GACGTGCACC AGCCGCGCGT 
23641 ATCGGCAGGT CAACCGGCGA TTCCTGCGGG 
23701 CCCCGCTGCA TTTCCTCGCT GAATTCAGCG 
23761 TGACAGCCCC TATCGATCGA CCAGGAGTTT 

23 821 CGAGCGACAT AGCCGTCGTC GGCATGTCCT 
23881 AGTTCTGGCA TCTGCTGACC ACCGGAGGCA 
23941 GGCGCGGCTC CCTGGACGGA GCCGCCGACT 
24001 GCCAGGCCGC CGCCGCCGAC CCGCAGCAAC 

24 061 TGGAGAACGC CGGGATCGTC CCCGGCAGCC 
24121 GCATCGCGGC CGACGACTAC GCGGCACTCC 
24181 ACACCGCGAC GGGCCTGAGC CGGGGCATGG 
24241 TGCGCGGTCC CAGCCTCGCG GTGGACAGCG 

243 01 TGGCCTGCGA GAGCCTGCGC CGCGGCGAGT 
24361 TGATCCTCGC CGAGGACAGC ACGGCGGGCA 

244 21 GCCGCTGCCA CACCTTCGAC GCACGCGCCA 
244 81 GCGTCGTCCT CAAGCCCCTG GAGCGGGCAC 
24541 TCCGAGGAAG CGCGGTCAAC AACGACGGCG 
24601 AGGCCCAGGC CGCCGTCCTG CGGGCGGCGT 
24 661 TGTCCTACGT CGAACTGCAC GGTACGGGGA 
24721 CTCTCGGCGC GGTCCTCGGC ACGGCCCACG 
24 781 TCAAGACGAA CGTCGGCCAC CTGGAGGCGG 
24 841 CCCTGTGCGT CCGCGAGGGC GTGGTGCCGC 
24 901 CCATCCCCAT GGACCGGCTA AACCTGCGCG 



GGGGATGCCC GAGGCGCCCG AGAGCTACAC 
CGCCGAGGGC GCCGACCTCA TAGTGAACGC 
CCCGCCCGCC CCGTGGGCCT CGGTGTCCTG 
GGCCGTCGAA TCCGGGCAGC GCCACCTCGG 
CCATCGCAGG GCGGCACGCG ACGAGTGGGA 
CCGACGCCTC GGCCTCCGGC CCTTCGGCGA 
CCGCCCGTAC CTGTTCCCCT TCAGCCCCAG 
CCAGAGCCAC GTCACCGGCT ACTGGTTCTG 
GGAGTCGTTC CTGGAGGACG GGGAGCCCCC 
ACTCCACCGG CAGGAGGAGG CCCTCCAGGC 
CCGACTGGTC ATGGTCGACG GACCGGACAG 
CCAGGTGGAC TACGCCACCC TCTTCCCCAG 
CGGCACGACC GCCGAGGTCC TCCGGGCCGG 
CGATCACCCC TTCTGGGCGG CTCGACTGTC 
CTTCGCCCGC TTCAGCCGAG AGGCACTGGC 
CGCGATGGCG GGCCGGGCCA GGCGACTGGG 
CACCGCCTGC GACATCCTCG AGAAGTGGGC 
CCACCGGCGG GCGGGCCCGG AAGCCGCACG 
CGCTACGCCG ACAACCGGTA GGCGGAGAGC 
GGCAGCCGCT GGATCGTCTC CAGATCCCGC 
ATGCCGACCA GGTGCGCCAG AGCCTGGTGG 
GTGGCGACGG CCGAGAAGTG CGGAGCGAGC 
TCCACCTGCA GCAGGCCGAG GGCGCCGATC 
ACAACCAGGA GAACGCCACG GGGATGGAGA 
GCGAACGTGT TGATCACCAT GCCGGCTGCG 
TCGGTCACCG CGGCCGCGAT CCGCGGATGC 
GAGATGTCCA GCAGCAGGCC CTGGGCATCG 
TGGCCCGTCC CCCCACCGAT GTCGACCACA 
CGCGCCAGAT CGACCAGCGC ATCCATCACG 
TCCCGGGCCT CCACCATTTC CTTGGTGTCG 
TTCACATAAC CCTGCTTCGC GATGTCGAAG 
TCGCCCTGAG CCAGCGAAGC ACCGCAGTGC 
AGCATGGGGC ACCGTTCCTT CGGGCGAGTC 
CGGCCGCCCA CCGCCCGGCG CAGGGCCGAC 
GAGAACTCCT CATGCGCGCA CCCTACGGAA 
AATTCCGAGC GAAACGGCCT CACTGTGTTT 
AATCCCGGCA GACGACCGGC TCTGCCGGCG 
CGATGGCCCC GAAGAGTGGT GCGCAGCGTT 
GCCGCCTTCC GGGGGCACCG GGCATCGATG 
GCGCGATCGA GCGTCGCGCC GACGGCACCT 
TCGACGCCGC CTTCTTCGAC ATGACCCCCC 
GACTCATGCT GGAACTCGGC TGGACGGCCC 
TCGCCGGCAC GGACACCGGC GTCTTCGTCG 
TGCACCGGTC CGCCACCCCC GTCAGCGGGC 
CCGCCAACCG CCTCTCCTAC CTCCTGGGCC 
CGCAGTCCTC CTCGCTCGTC GCGGTCCACC 
CCGACCTCGC GATCGTCGGC GGCGTCAGCC 
TGGAGCTCAT GGGCGCGCTC TCGCCGGACG 
ACGGCTACGT ACGCGGTGAG GGCGGAGCCT 
TGGCCGACGG GGACCGCGTC CACTGCGTCG 
GCGGCTCCAC CCTGACCACC CCCCACCGCG 
ACGAACGGGC CGGGGTCGGC CCGGACCAGG 
CGCCGGTCGG CGACCCCGTC GAGGCGGCGG 
GCCGTAACGC CCCGCTGTCC GTGGGATCGG 
CCGCGGGCCT CGTGGGATTC GTGAAGGCAG 
CGAGCCTGAA CCACGCGACG CCCAACCCTG 
TACCCACGCG ACTGGAGCCC TGGCCGCACC 
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24961 CGGACGACCG AGCGACCGGC CGGCTGCGAC 
25021 GGACGAACGC GCATGTGGTG GTGGAGGAGG 
25081 GGGCCGGTGT GCCTTTGGCT GTGGTGCCGG 
2 5141 TGGCTGAACT GGCCTCCCGC TTGAACGAGT 
25201 GGTTGTCGTC GGTGGTGTCG CGGTCGGTGT 
25261 ACTCTGCCGA GCTGAGTGCC GGTTTGGATG 
25321 TGGTGTCGUG TGTGGCGTCG GTCGGGGGTG 
25381 GGGTGAAGTG GGCGGGGATG GCGCTCGGGT 
25441 CGATGGCGCG GTGTGAGGCG GCGTTCGCCG 
25501 TGGGTGATGG GGCTGCGTTG GAGCGTGAGG 
25561 TGGTGTCGTT GGCGGCGTTG TGGCGGTCGT 
25621 ATTCGCAGGG GGAGATCGCT GCTGCGGTGG 
2 5681 CGCGTGTGGT GGTGTTGCGT GCGCGGGTGG 
25741 CTTCGGTGCG TCTTTCGCGG GCCGAGGTGG 
25801 TGAGTGTGGC GGTGGTGAAT GCGCCGTCGT 
25861 TGGATCGGTT TGTTGCGGCG TGTGAGGCGG 
25921 GGTATGCGTC GCATTCGAGG TTCGTGGAGC 
25981 CCGATGTCCG TCCGGTGAGG GGGCGGATTC 
2 6041 TCGATACGGC TGGTCTGGAT GCGGAGTACT 
26101 TCCAGGAGAC GGTCGAGCGG CTGTTGGCGG 
26161 CGCATCCGGT GCTGACCGGG GCGGTGCAGG 
26221 GCTCCGTCGG ATCCCTGCGT CGTGACGAGG 
26281 CGGAGGCGTT CGTCCAGGGG GTGGAGGTGT 
26341 CCCGGACGGT CGACCTGCCC ACCTACCCCT 
26401 GCTCCGCGTC GGCGGCGCCC ACACGGGACA 
26461 CGGACACGAT GGACCTCGCG GGACAACTTC 
26521 AGCAGATCGC GCGGTTGCTC GACCAGGTAC 
26581 ACGCCCGGGA CGAGGTGCGC GCGGAGGCGA 
26641 CCGGCGTCGA GCTGAAGAAC CACCTCCGTG 
26701 TGATCTACGA CTGCCCGACT CCCCTCGCCG 
26761 GCCGACCCGC GGAGCAGGCC GTCGTCCCCG 
26821 TCGTGGGGAT GGGGTGCCGT CTGCCGGGTG 
26881 TGGTGGCGTC GGGGGTGGAT GCGGTTTCTC 
26941 GGGGTCTGTT CGATCCGGAG CCGGGGGTGC 
27001 TCCTTCATGA GGCGGGGGAG TTCGACGCGG 
27061 TGGCGATGGA TCCGCAGCAG CGGTTGTTGC 
27121 CGGGCATCGA TCCGCACACG CTTCGCGGTT 
27181 CCCAGGAATA CGGCCCCCGA CTCCACGAAG 
27241 CCGGATCCTC CAGCAGTGTC GCCTCGGGTC 
27301 CGGCGGTGAC GGTGGACACC GCGTGCTCGT 
27361 GGGCGCTGCG CAGCGGTGAG TGCGACCTCG 
27421 AACCCGGCAT GTTCGTGGAG TTCTCACGGC 
27481 AGGCGTACTC GGACTCGGCC GATGGCACGG 
27541 TCGAGCGGTT GTCGGATGCG GTACGTCATG 
27601 CCGCGGTCAA CCAGGACGGT GCGAGCAACG 
27661 CGCGTTTGAT CCGTCAGGCG TTGGCGGATG 
27721 TGGAGGGCCA CGGCACGGGG ACGCGTCTGG 
27781 CGACGTATGG GCAGCGGGAT GCGGGTCGGC 
27841 TGGGGCATAC GCAGGCGGCT GCCGGTGTGG 
27901 GGCACGGTGT CCTGCCGAAG ACGCTGCACG 
27961 CGGCCGGCGC GGTGTCCCTG CTGAGGGAGC 
28021 GGCGGGCCGG AGTCTCCTCG TTCGGCGTGA 
2 8081 AGGCGCCGGT TCCGGAGGAC GGGGAGGCGA 
2 8141 CGGTGGTGGT GTCGGGTCGT TCTGCGGGTG 
28201 AGGTCGCTGG GTCTGGTCGG TTGGTGGATG 
2 8261 TGTTCGAGCA CCGGTCCGTG GTGCTGGCGG 
28321 ATGCTCTGGC CGCTGACGGA GTGTCTCCTG 



TGGCCGGCGT CTCGTCCTTC GGCATGGGTG 
CGCCGCTTCC GGAGGCCGGG GAGCCGGTCG 
TGGTGGTGTC GGGTCGTTCT GCGGGTGCGG 
CGGTTCGTTC GGATCGGTTG GTGGATGTGG 
TCGAGCATCG GTCCGTGGTT CTGGCGGGGG 
CTCTGGCCGC TGACGGAGTG TCTCCTGTCC 
GCCGGTCGGT CTTCGTGTTC CCGGGTGCGG 
TGTGGGCGGA GTCTGCTGTG TTTGCGGAGT 
GGTTGGTGGA GTGGCGTCTG GCGGATGTGC 
ACGTGGTGCA GCCGGCGTCG TTCGCGGTGA 
TGGGTGTGGT GCCGGATGCG GTGGTGGGGC 
TGGCGGGTGG TCTGTCGTTG GAGGACGGGG 
CTGAGGAGGT TTTGTCGGGT GGGGGGATTG 
AGGAGCGGTT GGCGGGTGGG GGTGGTGGGT 
CGACGGTGGT GGCGGGTGAG TTGGGGGATT 
AGGGGGTGCG GGCGCGTCGG CTGGAGTTCG 
CGGTGCGTGA GCGGTTGTTG GAGGGGTTGG 
CGTTCTATTC GACGGTGGAG GCTGCGGAGT 
GGTTCGGGAA TCTGCGTCGG CCGGTTCGCT 
ATGGTTTCCG GGTGTTCGTG GAGTGCGGCG 
AGACCGCGGA GACTGCGGGC CGGGAGATCT 
GTGGACTGCG TCGCTTCCTG ACCTCTGCGG 
CCTGGCCGGT GCTGTTCGAC GGCACCGGCG 
TCCAACGCCG GCACCACTGG GCACCCGACG 
TCCGACCGGA CGAGACCGCC GCGGTTCCAG 
GCGCGGACGT GGCGTCGTTG CCCACCACCG 
GCGACGGTGT CGCCACGGTC CTCGGACTGG 
CGTTCAAGGA ACTGGGCGTC GAATCGCTGA 
CCCGGACCGG ACTGCACGTC CCCACCTCGC 
CCGCTCACTA CCTCCGCGAC GAGCTCTTGG 
CCGGCATCCC GGTCGACGAA CCGATCGCGA 
GGGTGTCGTC GCCGGAGGGG TTGTGGGATC 
CGTTCCCCAC GGATCGGGGT TGGGATGTGG 
CGGGGCGTTC GTATGTGCGT GAGGGCGGGT 
GGTTCTTCGG TATCTCTCCG CGTGAGGCGT 
TGGAGACGTC GTGGGAGGCG TTGGAGCGGG 
CACGGACCGG CGTCTACGCC GGAGTGATGG 
GCGCAGACGG ATACGAGGGC TATCTGCTGA 
GTATCTCGTA CGTGCTGGGT CTGGAAGGGC 
CGTCGCTGGT CGCGCTGCAC CTGGCCGTGC 
CCCTCGCCGG CGGCGCGACC GTCATGGCCG 
AGCGCGGGCT GTCTGCCCAC GGACGGTGCA 
GCTGGGCCGA GGGGGCGGGT GTGCTGCTCG 
GGCGTCGGGT GCTGGCGGTC GTGCGTGGTT 
GACTGACGGC GCCGAACGGG CGGTCCCAGT 
CGCGGTTGGG TGTGGCTGAT GTGGATGTGG 
GTGATCCGAT CGAGGCGCAG GCGTTGTTGG 
CTCTGCGGCT TGGTTCGCTG AAGTCGAACG 
CGGGCGTGAT CAAGATGGTC ATGGCGATGC 
TGGATGAGCC GACGGCGGAG GTGGACTGGT 
AGGAGGCGTG GCCGCGTGGC GAGCGTGTGC 
GTGGGACGAA CGCGCATGTG GTGGTCGAGG 
TCGAGGGCGG TGCGCCTTTG GCTGTGGTGC 
CGGTGGCGGA GCTGGCGGGC CGGGTCAGCG 
TGGGGTTGTC GTCGGTGGTG TCGCGGTCGG 
GGGACTCTGC CGAGCTGAAT GCCGGTCTGG 
TCCTGGTGTC GGGTGTGGCG TCGGGTGAGG 
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2 8381 GTGGCCGGTC GGTGTTCGTG TTCCCTGGTC AGGGGACGCA GTGGGCGGGG ATGGCGCTCG 
28441 GGTTGTGGGC GGAGTCGGCG GTGTTCGCGG AGTCGATGGC GCGGTGTGAG GCGGCGTTCG 
28501 CCGGGTTGGT GGAGTGGCGT CTGGCGGATG TGCTGGGTGA CGGGTCTGCG TTGGAGCGGG 
28561 TCGATGTGGT GCAGCCGGCG TCGTTCGCGG TGATGGTGTC GCTGGCGGAG TTGTGGCGGT 
28621 CGTTGGGTGT GGTGCCGGAT GCGGTGGTGG GGCATTCGCA GGGGG AGATC GCTGCTGCGG 
28681 TGGTGGCGGG TGGTCTCTCG CTGGAGGATG GCGCGCGTGT GGTGGTGTTG CGTGCGCGGT 
28741 TGATAGGCCG TGAGCTGGCC GGGCGCGGTG GGATGGCGTC GGTGGCGCTG CCCCTCGCGG 
28801 TGGTGGAGGA GCGTCTGGCG GGGTGGGCGG GGCGTCTGGG TGTGGCGGTG GTCAACGGAC 
2 8861 CCTCCGCCAC GGTCGTCGCG GGTGATGTGG ATGCGGTGGG GGAGTTTGTG ACCGCGTGCG 
2 8921 AGGTGGAGGG GGTTCGGGCG CGTGTTCTGC CGGTGGACTA CGCCTCGCAC TCGGCGCACG 
2 8981 TGGAGGACCT GAAAGCCGAG CTTGAACAGA TTCTGGCCGG CATCGGCCCG GTGACCGGTG 
2 9041 GCATCCCGTT CTATTCGACG TCCGAAGCCG CGCAGATCGA CACGGCTGGT CTGGACGCGG 
2 9101 GGTACTGGTT CGGGAATCTG CGTCGGCCGG TGCGGTTCCA GGAGACGGTC GAGCGGTTGC 
29161 TGGCGGATGG TTTCCGGGTG TTCGTGGAGT GTGGCGCGCA TCCGGTGCTG ACGGGGGCGG 
29221 TGCAGGAGAC CGCGGAATCC ACCGGTCGCC AGGTGTGTGC GGTCGGATCC CTGCGTCGTG 
29281 ACGAGGGAGG TCTGCGCCGC TTCCTGACCT CTGCGGCGGA GGCGTTCGTC CAGGGGGTCG 
2 9341 GGGTGTCCTG GCCGGCACTG TTCGACGGCA CCGGCGCCCG CACGGTCGAC CTGCCCACCT 
29401 ATCCCTTCCA GCGTCGGCGT TACTGGCTGG AGTCACGTCC TCCTGCGGCG GTTGTTCCGT 
29461 CGGGGGTCCA GGACGGATTG TCGTATGAGG TGGTGTGGAA GAGCCTGCCG GTACGGGAGT 
29521 CGTCGCGTCT TGACGGCCGG TGGCTGCTCG TCGTGCCCGA AACCCTGGAC GCCGACGGCA 
29581 CGCGGATCGC CCACGACCTC CAGCACGCCC TCACCACCCA CGGCGCCACG GTCTCCCGTG 
2 9641 TGTCGGTCGA CGTGACGACG ATCGACCGCG CCGACCTGTC GGCGCGGCTC ACCACGAGCG 
29701 CGGCCGAAGA CCAGGAACCG CTTGGGCGAG TGGTGTCCCT CCTGGGGTGG GCCGAGGGAG 
29761 TACGGGCCCA TGGCCCGAAC GTACCGACTT CCGTCGCCGC CTCCCTGGCA CTCGTGCAGG 
29821 CTGTCGGCGA TGCCGGGTTC GGTGTTCCGG TGTGGGCGGT GACGCGGGGT GCGGTGTCCG 
29881 TGGTGCCTGG TGAGGTGCCG GAGACGGCGG GTGCGCAACT GTGGGCGCTC GGCCGGGTCG 

2 9941 CCGGTCTCGA ACTTCCCGAC CGTTGGGGTG GTCTGATCGA TCTTCCGGCG GATGCCGATG 
30001 CGCGTACGGC GGGGCTTGCG GTGCGGGCCC TGGCCGCCGG GATCGCCGAT GGTGAGGACC 
30061 AGGTGGCGGT GCGCCCCTCG GGTGCCTACG GCCGGCGTGT AGTTCAGGCA GCCCACCGGG 
30121 AGCCGTCGGG AGCGAAGACG GAGTGGCGAC CGCGTGGCAC CGTGCTCGTC ACCGGGGGAA 
30181 TGGGCGCCAT CGGCACTCGG GTGGCCCGCT GGCTGGCCCG GAACGGAGCC GAACACCTGG 
30241 TGCTGACCGG CCGCCGCGGT GCCGGGACCC CCGGCGCGGA CGAGCTGGCG GGAGAGCTGA 
30301 GGGCGTCCGG AGTCCAGGTC ACGCTCGCCG CCTGCGACGT GTCCGATCGT GCCGCGCTGG 

3 0361 CCGCGCTGCT CGACGCGCAT CCGCCGACCG CCGTCTTCCA CACGGCCGGT GTACTGAACG 
3 0421 ACGGAACGGT CGACACGCTC ACTCCCGCTC ACCTGGACGG GGTCCTGAGC CCCAAGGCGA 
3 0481 CGGCCGCCGT TCACCTGCAC GAGCTCACCG CCCACCTGGA CCTGGACGCC TTCGTCCTCT 
3 0541 TCGCCTCCGT CACCGGCGTA TGGGGTAACG GCGGCCAGGC CGGGTACGCC ATGGCCAACG 
3 0601 CGGCTCTGGA CGCGCTCGCC GAGCAGCGCC GTGCCGGCGG ACTTGCGGCG ACCTCCATCA 
30661 GTTGGGGCCT CTGGGGTGGC GGCGGCATGG CCGAGGGTGA CGGCGAGGTG AGCCTCAACC 
3 0721 GTCGTGGAAT CCGCGCTCTT GAGCCCGCCA CCGGCATCGA GGCGCTGCAG CGGACGCTCG 
3 0781 ACCAGGGCGC CACCTGCCGC ACCGTCGTCG ACGTGGACTG GGGTCAGTTC GCTCCTCGTA 
3 0841 CGGCGGCGCT GCGGCGCGGG CGGCTCTTCG CGGATCTGCC CGAGGTGCGG CGTGTCCTGG 
30901 AGTCCGAGGG GGTTGCACGG GAGGACGCCG GAACCGTCGA GCCCGGCGCC GTGCTCGCCG 
30961 AGCGCCTCGC ATCGCGCTCC GAGGCCGAAC AGCGACGCAT GCTCGTCGAG TTGGTACGAG 
31021 CCGAAGCGGC TGCCGTCCTG CGTCACGACA CGACGGACCT CCTGGCGCCG CGCAGGTCGT 
31081 TCAAGGACGC CGGGTTCGAC TCCTTGACCG CGCTGGAGCT CCGTAACCGG CTGAACACCG 
31141 CCACCGGTGT CGTCCTTCCC GTCACCGTCG TCTTCGACCA CCCGAACCCC GGTGCACTGG 
31201 CGGACTTCCT GTACGGCGAA GCACTGGGCC TGTCCGCGGC CAGGTCTTCC GCGAGCGATA 
31261 CGGCCGACAC GACCCGCCCG GCCGCCGCCC CCGAAGAGCC GATCGCGATC GTCGGAATGG 
31321 CCTGCCGCTA CCCGGGCGAG GCCCGTTCCC CCGAGGAACT GTGGAAGTTG CTCATCGACG 
31381 AACGGGACGT CATCGGCCCC ATGCCCACGG ATCGGGGTTG GGATGTGGGG GGTCTGTTCG 
31441 ATCCGGAGCC GGGGGTGCCG GGGCGTTCGT ATGTGCGTGA GGGCGGGTTC CTTCATGAGG 
31501 CGGGGGAGTT CGACGCGGGG TTCTTCGGTA TTTCTCCGCG TGAGGCGTTG GCGATGGATC 
31561 CGCAGCAGCG GTTGTTGCTG GAGACGTCGT GGGAGGCGTT GGAGCGGGCG GGCATCGATC 
31621 CGCACACGCT CCGCGGCTCA CAGACCGGCG TCTACGCGGG GATGTTCCAC CAGGAGTACG 
31681 CGACCCGGCT GCACGAGGCA CCCGTGGAGT TCGAAGGCCA CTTGCTGACG GGGACGTCCG 
31741 GGAGTGTGGC TTCGGGTCGT ATCTCGTATG TGCTGGGTCT GGAGGGGCCG GCGGTGACGG 
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31801 TGGACACGGC GTGTTCGTCG TCGTTGGTGG 
31861 GTGGTGAGTG TGACCTTGCT CTTGCGGGTG 
31921 TCGTGGAGTT CTCGCGGCAG CGGGGGTTGG 
31981 CTGCTGCCGA TGGCACGGGC TGGGCCGAGG 
32041 CGGATGCGGT GCGTCATGGG CGTCGGGTGC 
32101 AGGACGGTGC GAGCAATGGG TTGACGGCTC 
32161 GTCAGGOiTT GGCGGATGCG CGGTTGGGTG 
32221 GGACGGGGAC GCGTCTGGGT GATCCGATCG 
32281 AGCGGGATGC GGGTCGGCCT CTGCGGCTTG 
32341 AGGCGGCTGC CGGTGTGGCG GGCGTGATCA 
32401 TGCCGAAGAC GCTGCACGTC GATGAGGTCT 
32461 TGTCCCTGCT GACGGAGCAG GAGCCGTGGC 
32521 TCTCCGCGTT CGGTGTGAGT GGGACGAACG 
32581 CGGAGGCGCC GGTCGCGGTG GAGCCGGTGG 
32641 TGCCCGTGGT GGTGTCGGGT CGTTCTGCGG 
32 701 ACGAGTCGGT TCGTTCGGAT CGGTTGGTGG 
32761 CGGTGTTCGA GCATCGGTCC GTGGTTCTGG 
32821 TGGATGCTCT GGCCGCTGAC GGAGTGTCTC 
32881 GGGGTGGCCG GTCGGTGTTC GTGTTCCCGG 

32 941 TCGGGTTGTG GGCGGAGTCT GCTGTGTTTG 
33001 TCGCCGGGTT GGTGGAGTGG CGTCTGGCGG 

33 061 GTGAGGACGT GGTGCAGCCG GCGTCGTTCG 
33121 GGTCGTTGGG TGTGGTGCCG GATGCGGTGG 
33181 CGGTGGTGGC GGGTGGTCTG TCGTTGGAGG 
33241 GGGTGGCTGA GGAGGTTTTG TCGGGTGGGG 
33301 AGGTGGAGGA GCGGTTGGCG GGTGGGGGTG 
333 61 CGTCGTCGAC GGTGGTGGCG GGTGAGTTGG 
33421 AGGCGGAGGG GGTGCGGGCG CGTCGGCTGG 
33481 TGGAGCCGGT GCGTGAGCGG TTGTTGGAGG 
3 3541 GGATTCCGTT CTATTCGACG GTGGAGGCTG 
33601 AGTACTGGTT CGGGAATCTG CGTCGGCCGG 
33661 TGGCGGATGG TTTCCGGGTG TTCGTGGAGT 
33721 TGCAGGAGAC CGCGGAGACT GCGGGCCGGG 
337 81 ACGAGGGAGG TCTCCGTCGC TTCCTGACCT 
33841 AGGTGTCCTG GCCGGTGCTG TTCGACGGCA 
33901 ACCCCTTCCA ACGCCGCCAC TACTGGGCAC 

33 961 CTGCGGCCCG CTTCGGTATG ACCTGGGAGG 

34 021 TCGCGGACTC CGGCGAGCTG CTGCTCGTCG 
34081 TCGCCGACCA CACCGTGGCC GGGACCGCCC 
34141 CACTGCACGC CGCCGCGGTC GCGGGCTGTG 
34201 CGCTGCCGGT GCACGGCGGG ATCCGGCTCC 
34261 CGCGGCGCCG CGTGAGCGTG TTCGCGAGGC 
34321 GGACCCGACA CGCCACCGGC GTGCTGACCC 
34381 AGTGGTGCCG GCAGGCCTGG CCGCCGAGCG 
34441 ACGACCGGTT CTCCGCGCTG GGATACGACT 
34501 TCTGGCTGCG CGAGGGCGAG GCCTTCGCCG 
34561 ACGCCGAGCG GTTCGGGGTG CACCCCGGCC 
34 621 TGGGCGACTT CCTGTCGCGG CCCGACGGCG 
34681 GCATCACGCT CCACACGGCC GGTGCCGACG 
34741 AAGGCGCTCT GTCGCTCGAA GCCGTCGACC 
34 801 CACTGGTGCT GCGTCCGCTC GCCCAGGACC 
34861 CCACCCCGCT GTACCGCGTG GACTGGCAGC 
34 921 CCACGGGGCT CTTCGGCTCC CTCCCGTCCG 
34 981 AGGGCGGTGT CGCCGCACGG TACGCGACGG 
3 5041 TCCCCGACCT GGACGCACTG CGTACGACGC 
35101 TCCTCGCCGA CTTCGGGGCC CGGCCAGGCG 
35161 ACGGCGCACG CGACACGGTC CGGCGGGGGC 



CGCTGCATCT GGCGGTGCGG GCGTTGCGGA 
GGGTGACGGT GATGGCGGAG CCGGGTGTGT 
CTGCGGACGG GCGGTGCAAG GCGTTCGCGG 
GTGTGGGCGT GCTGGCGGTG GAGCGGTTGT 
TGGCGGTGGT GCGTGGCTCG GCGGTGAATC 
CGAACGGTCC GTCGCAGCAG CGGGTGATTC 
TGGCTGATGT GGATGTGGTG GAGGGGOATG 
AGGCGCAGGC GTTGTTGGCG ACGTATGGGC 
GTTCGCTGAA GTCGAATGTG GGGCATACGC 
AGATGGTCAT GGCGATGCGG CACGGGGTCC 
CTCCGCACGT CGACTGGTCG GCGGGCGCGG 
CCCGTGGTGA GCGCGTACGG CGGGCTGGTG 
CGCATGTGGT GGTGGAGGAG GCACCTGCTT 
AGCCGGGGGC TGTGGGGCTT CTTCCGGTGG 
GTGCGGTGGC TGAACTGGCC TCCCGCTTGA 
ATGTGGGGTT GTCGTCGGTG GTGTCGCGGT 
CGGGGGACTC TGCCGAGCTG AGTGCCGGTT 
CTGTCCTGGT GTCGGGTGTG GCGTCGGTCG 
GTGCGGGGGT GAAGTGGGCG GGGATGGCGC 
CGGAGTCGAT GGCGCGGTGT GAGGCGGCGT 
ATGTGCTGGG TGATGGGGCT GCGTTGGAGC 
CGGTGATGGT GTCGTTGGCG GCGTTGTGGC 
TGGGGCATTC GCAGGGGGAG ATCGCTGCTG 
ACGGGGCGCG TGTGGTGGTG TTGCGGGCGC 
GGATTGCTTC GGTGCGTCTT TCGCGGGCCG 
GTGGGTTGAG TGTGGCGGTG GTGAATGCGC 
GGGAGTTGGA TCGGTTTGTT GCGGCGTGTG 
AGTTCGGGTA TGCGTCGCAT TCGAGGTTCG 
GGTTGGCCGA TGTCCGTCCG GTGAGGGGGC 
GGGAGTTCGA TACGGCTGGT CTGGATGCGG 
TTCGGTTCCA GGAGACGGTC GAGCGGTTGT 
GTGGTGCGCA TCCGGTGCTG ACCGGGGCGG 
AGGTGTGTGC GGTTGGTTCG CTGCGTCGTG 
CTGCGGCGGA GGCGTTCGTC CAGGGGGTGG 
CCGGCGCCCG CACGGTCGAC CTGCCCACCT 
AGTCCTCGCC CGCCGGCGCC GGCAGCTCTG 
AGCATCCGCT CCTCGGCGGC GCGCTGCCGC 
GGAGGATCTC CCCGGCCTCC . CACTCCTGGA 
TGCTGCCCGG GACGGCCTTC GTCGACATGG 
CGGGTGTGGA GGAGCTGAGC ATCGAGGCCC 
AGGTGGTGAT CGACGAGCCC GACTCCTCCG 
CGGAAGAGGA AGACGGGGAC GCCGGCCGCT 
CCGACGTCGC CCCCGAGCCG GGCCGGCCGC 
GCTCCGTCCG GGTGGAGGCG TCGGAGCTCT 
ACGGCGAGGT CTTCGCCGGG GTCGAGGCCG 
AGGTCCGCCT GCCCACGGGC GCGGCGCCCG 
TCCTCGACGC GGCTCTGCAC CCCTGGCTGC 
GATCCGTACT GCTGCCGTTC GCGTGGCGCG 
CGCTGCGGGT CCGTCTCGGA CCGGCCGGAG 
TCTCCGGTGC CCCGGTGCTG TCGATGGACG 
GCCTGGCGGA ACTGGTCGGC GGCACGACCT 
GGAGCCCGAT CGCGAGGACG GCGCCGTCGG 
GTGCCGTCCG CCGCTGGGCG GTCGTCGGGC 
CGGAACCCGG CACGGGGTGC GTCGGGGTCT 
TGGACTCCGG AGCGGACGGC CCCGACATCG 
ACGCCGCGCC GCACGGGACG GATCCGGCCG 
TCGCCCTCAT ACAGGGCTGG CTGTCCGACG 



48 



3 5221 AGCGCTTCGC CGCCGCGCGT CTCGCCGTGC TCACCGAGCA 
35281 ACACCCGCCG GACGGACCTC GCGGGCTCGG CACTGTGGGG 
35341 CCGAGCACCC CGACCGCTTC GTCCTCGTCG ACCACGACGG 
35401 CGCTGCCCAC CGCGCTCGAC AGCGAAATCC CGCAACTGGC 
35461 TGGCCCCCGA ACTGGCGGTC CTGCCGAGTC CGGCCGACGG 
35521 CGTTCGATCC CGAAGGCACG GTACTCGTCA CCGGAGCCAC 
35581 TGGCCCGGCA CCTGGTCACG GCACACGGCU TACGGCATCT 
35641 GACGCGAAGC CGCGGGGGCC GCCGAACTGG AGCGTGAACT 
35701 TCCAGCTCCT CTCCTGTGAC GCGACGGACC GGGCAGCGCT 
35761 TCCCCGCCGC CCACCCGCTG ACCGCGGTGA TCCACACGGC 
35821 TCGTCGAGGC GCTGACCCCC GAACGGCTGG ACCGCGTGCT 
35881 CGCTGAACCT GCACGACCTG ACCGAGGGAA TGCCGCTGAA 
35941 GGGCGGTCGG ACTGCTGGGT GGAGCGGGCC AGGCCAACTA 
36001 TGGACGGCCT GGCCCAACAC CGGCACGCGC AAGGGCTGCC 
36061 GACTCTGGAG CGCCACCAGC ACGTTCACCG ACCATCTCGG 
36121 TGGAGCGGTC CGGCATCACG CCGCTCACGG ACGAGCAGGG 
36181 CCCTCGGCGC CGCGGTGGAC GCGCCGCAGC TCTGCGTGAT 
36241 TGCGCCGGCA GGCGGCCGAG CACGGGCCGA CTTCGATGCC 
36301 CGGCGCCTCC CGTACGGCGC GGCGCGGGGC GCTCCGGCCG 
36361 CCACGGACGC GCCGAGCCGG GCGCAGGCCC TGCGCGAGCG 
3 6421 CGGCACGGCG GGACGAACTC CTGGTCCTGT CGCAGGCGCA 
36481 TCGCCGACAA GACCGCGGTG GACCCCGTTC GTTCCTTCCG 
36541 TGACCGCCGT CGAGCTGCGC AACCGGCTCG GTGTCGTCAC 
36601 CGCTGGTCTT CGACCACCCC AACCTCGACG CGCTCGCGGC 
3 6661 CGGCTGAGGG CCGGGACGAC GCGGGCGCCG CGGCGCTCTC 
3 6721 GGGCGGTCCG GGAGATGGCG GCCGACGACA CGCGCCGTGA 
3 6781 CGGAACTGCT GGCGGTGGTC GGCGACGCCC CGCGGGACGG 
36841 CCGCCGACGC GGGAGGCCGC GACGCTCAAG CGGACCCCGA 
36901 CCGCCTCCGA CGACGATCTG TTCGCGTTCA TCGAAGACCA 
36961 CGCGTTCCCC ACCCGTCCCT AAGCGCCGCA TCAGGCGCAC 
37021 AGGCCAGGAG GGTCCGGTCG ATGACGGCCA ACGATGACAA 
37081 GGGTCGTCGC CGAGCTGCAC AGCACGCGGC AACGGCTCAA 
37141 GCGAGCCCAT CGCCATCGTG GGGATGAGCT GCCGGCTGCC 
37201 AGAGCCTGTG GAGGCTGGTC GACTCCGGCA CCGACGCCGC 
37261 GGGGCTGGGA CCTGGACGCG CTCCACCATC CGGAGTCGGG 
37321 GCGGATTCCT CCACGACAGC GCGGACTTCG ACGCGGAGTT 
37381 AGGCCCTGGC CATGGACCCG CAGCAGCGGC TGCTGCTGGA 
37441 AGCGCGCCGG CATCGACCCG GTCTCCGCCC GCGGCTCCCG 
37501 TCATGTACCA CGACTACGGC GCCCGGCTGA ACGAGATCCC 
37561 TGGTCAACGG CAGCGCGGGC AGCATCGCCT CGGGCCGGGT 
37621 AGGGCCCCGC CGTCACCGTC GACACGGCCT GCTCCTCGTC 
37681 CGGCACAGGC ACTGCGGCGG CGGGAGTGCG ACATGGCGCT 
37741 TGTCCACCCC CGACCTGTTC ATCGACTTCG CGCGACTCGG 
37801 GCTGCAAGGC GTTCTCCGAC GCCGCCGACG GCACCAGCTT 
37861 TGCTCCTCAT GCGGCTGTCG GACGCGGTGG CCGAGGGCCA 
37921 GAGGCTCCGC CGTCAACCAG GACGGGGCGA GCAACGGCCT 
37981 CCCAGCAACG CGTGATCCGC GAGGCGCTCG CCGACGCGGA 
3 8041 ACGCGGTGGA GGCGCACGGC ACCGGAACCA GGCTCGGCGA 
3 8101 TGCTGCACAC GTACGGCACG AGCCGCAGCC CCGAACGACC 
38161 AGTCCAACAT CGGCCACACC CAGGCCGCCG CCGGAGTGGC 
38221 TGGCGATGCG CCACGGACGG CTGCCCCGCA CACTGCACGT 
38281 TGGAATGGTC GGCGGGCGCG GTGGAACTGC TCACGCGGGC 
3 8341 GGAACGCCCC GCGCCGCGCC GGAGTGTCGT CCTTCGGTGC 
38401 TGATCCTGGA AGGCGTCCCG GACGGCGACA TCACGGTCGC 
3 8461 GCGGCGGCGC CTGGCCGCTC GCGGGCCGGA CCGAAGCGGC 
3 8521 GGCTCCACGA CCACCTCGCC GCCCGCCCCC ACGTCTCACC 
3 8581 TGGTCCGCTC CCGCACGGCG TTCGAGCACC GGGCCGTCGT 



CGCCGTCGCC ACCGAGGCGG 
GCTGATGCGT TCGGCGCAGA 
GCAGGACGCC TCGTACCGGA 
GCTCCGAGCC GGGGAGACGC 
GGGGCCCGCG ACAAGCGCGG 
CGGCACCCTC GGCAGCCTGC 
GCTGCTGCTC ACCCGCAGCG 
CCGTCAACGG GGAGCCGAGT 
GAAGGAGGCC CTCGCCACCG 
CGGCGTCCTC GACGACGGCG 
GCGCCCCAAG GCGGACGCCG 
GGCGTTCGTC CTGTACTCCG 
CGCGGCAGCC AACGCGTTCC 
CGCGGTGTCC CTGGCATGGG 
CGAGGTGGAC CTGCGGCGCA 
CCTTGACCTG TTCGACCGGG 
GGGGCTGGAC ACGGCGGCGC 
TCCTCTGCTG CGTACGCTGG 
GGGCGGACGG GCGGCGTCCG 
ACTGACGGGC C7GGACGCGG 
GTTGGCCGAT GTGCTGGGCT 
CGAGATCGGT CTGGACTCGC 
CGGACTGCGG CTGCCGCCGG 
CCACCTGGCG GAGCTCCTCG 
GGGAATCGAC GCCCTGGACC 
CGCCGTCCGC CGACGCCTCG 
CGGCCGCGCC CCACGGGCGG 
CCTGCTGGGC CGGCTGGACT 
GCTGTGAGCG GGACGCCGCG 
TCGCACCGAC ACGAGCACGC 
GATCCGCGAC TACCTGAAGC 
CGCCTTGGAG CACGACGCCC 
CGGCGGGGTG ACCACCCCCG 
CTCGCCGTTC CCCGACGACC 
AGCCGTCCAC TCCCGCGAGG 
CTTCGGCATC AGCCCGCGAG 
GACCGCCTGG GAGGTGTTCG 
CACGGGGGTG TACGCGGGCG 
GCCGGGCCTC GAGGGCTACC 
GGCCTACACC CTCGGTCTGG 
ACTGGTCGCC GTGCACCTGG 
CGCGGGCGGC GCGACCGTCC 
CGGCCTGGCC TCCGACGGGC 
CGCCGAGGGC GCCGGCCTGC 
CACCGTCCTG GCGGTCGTCC 
GACGGCCCCC AACGGCCTCG 
CCTGGACCCC GACCAGATCG 
CCCCATCGAG GCGCAGGCCC 
CCTGTGGCTC GGTTCGCTGA 
CGGAGTCATC AAGACGGTGC 
CACCCGCCCC TCCAGCCGGG 
ACAGGACTGG CCCGGCCAGG 
CAGCGGCACC AACGCACACC 
GGAGACCCGA CCGGCCACCG 
CCTGCGCGCA CAGGCCCGGC 
CGCCGCGGTC GGGCGGACGC 
CCTCGGACAG GACACCGCCG 
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38641 ACCTCCTGAG CGGCCTCGCG GAGCTGGCGT 
3 8701 CCGGCCGGGC CGCGCGCGGC CGCCGTACCG 
38761 GGCCGGGTGC CGGACGGCAT CTCTACGAGC 
38821 AGACGGCCGC GGCACTCGAC CGGCACCTCG 
38881 AGCCGGGCGG CGCGACGGCC GGACTCCTCG 
38941 TCGCCCTGGA AGTCGCCCTC TTCCGACTGG 
3 9001 TCCTCGGGCA CTCCGTCGGC GAACTGUCCG 
39061 CCGACGCCGC CCGCCTGGTG ACGGCGCGAG 
39121 GCGCCATGAT GGCGATCCAG GCGAGCGGCC 
39181 CGGCCCACCG GTCGGCGCGC GTCGCCGTCG 
39241 TCTCGGGCGA CGAGGACGTG GTCGCCGAAC 
3 9301 GCACCAGGGC GCTGCCCGTC AGCCACGCCT 
39361 AACCGTTCGC CCGGATCGCG CGCGACGTCT 
39421 CCAACCTGAC CGGCGGCATC GCGTCGGCCA 
39481 GCCACGCGCG CGAGGCCGTG CGCTTCAGCG 
39541 TCGACACCTT CATCGAGCTC GGACCGGACG 
39601 TCCGCGAGGA GGAAGGAGAC GCCCCACGCC 

3 9661 GCTCCCGCGC CGACGGGGGG CGGCGACCCG 
39721 GCGACGAGAC GACCACCTGC CTCGGGGCCC 
39781 TCGACCTCGC GGCCGTGCAC GGCGCCCCCG 
39841 CCTTCCAACG CACGCGCTAC TGGCTGGACG 
39901 CCGGTCTGGA GGCGACGGAC CAGCCCCTGC 
39961 AAGGCACCGT ACGGACCGGA CTGCTCTCCC 
40021 GCGTCCGCGA CCACGCCGTG GTGCCCGGAG 
40081 GCACCGAAGC GGGCTGCCCC CGGGTCGCCG 
40141 CCGAGAACGG AGAAGGAGTC CGACTGCGCG 
40201 TCCGTTCGCT ACGCATCGAC TCCCGGCCCG 
40261 CCGACTGGAC CCGCCACGCG TCCGGCACCC 
40321 GCACCGGCGT GCCGACCGAA CTGCTCGGCG 

4 03 81 CCCTCGACGC GGACGCCGTC GCGGCCGAGT 
40441 ACGGCCCCGC GTTCCGGGCA CTGCGCGCCG 
40501 AGGTCCGGCT TCCCGGCCAG GCGGCCGCCG 
40561 TGCTGGACGC CCTGACGCAC GCCACCGGGT 
40621 TGGTCCCGTT CGCCTGGAGT GACGTCCGGA 
40681 TACGCATCGC GCCGGCCGGC CCCGACGCCG 
4 0741 GCCCGGTCCT CGCCGCCCGC TCGCTCACGC 
4 0801 ACCCGGAGGC GGACAGCACG CCGCTGTACC 
40861 TGACCGGGCA CGCCGGTCCA AGGCAGGCGG 
40921 TCCAGGCCCT CCTCGACGCC GTGCGCGACG 
40.981 ACCTGCTCGC GCTCGCGGCC TCCGACACGG 
41041 GCCACGACGG GGACGCTCTC GCCACGGGCG 
41101 TGGTACAGGG CTGGCTGACC CACGCCCGCT 
41161 AGGGGGCGGT GACGGCCGGC ACGAGCCCCG 
41221 TGCTGCGCAG CGCACAGTCC GAGCACCCGG 
41281 CCGACCCGGC CGCCTCGTAC CGTTCCCTGC 
41341 TCGCCCTGCG CGGCGCCGAG ATCCTCGTCC 
41401 CCACCGTGCC CGGACACCCC GGCGACGTCA 
41461 CGGCCCCGTC CGGCACCCCC TCCGGCCCCT 
41521 GCGGAACCGG GACCCTGGGC AAGGCCGTGG 
41581 GGCACCTGAT CCTGGCCGGT CGGCGAGGCG 
41641 CCGAACTGAC CGGCCTGGGC GCCACCGTGA 
41701 CGGCGCTCGA AGGCGTCCTG GCCGCCGTCC 
41761 ACACCGCCGG AGTGCTCGAC GACGGCATCG 
41821 CGGTCCTGCG CGCCAAGGCG GACGCGGTCA 
41881 ACCTGTCCGC CTTCGTCCTC TTCTCATCGG 
41941 CCGGCTACGC GGCCGCCAAC AGCTTCCTCG 
42001 GCCTCCCCGC CGTGTCCCTG GCATGGGGAC 



CCGGCGGCGC TCACGGACCC GGCGTGATCA 
CACTGCTCTT CACCGGACAG GGCAGCCAGC 
GGTACGAGGT GTTCGCCCGC GCCCTGGACG 
ACCGCCCGCT GCGCGACGTG ATGTTCGCGG 
ACCGCACCGA GTACACCCAG CCCGCACTGT 
TGACCGCCGG GGGCCTGCGC CCCGACGCAC 
CCGCCCACGT CGCCGGAGTG TTCACCCTGC 
GCCGACTGAT GGGCGAGCTG CCGGCCGGTG 
CGGAGATCGA GGAGACGATC ACGGCGCTCG 
CCGCACTCAA CGGTCCCGAC GCCACCGTGA 
TCGCCACGCT GTGGCGGGAG CGGGGCCGCC 
TCCACTCGCC GCACATGGAC GCCGCACTGG 
CCTACGCCGA ACCGCGCATC CCGGTGGTCT 
CGACGCTGTG CGCCCCCGAG TACTGGGTGC 
ACGGCTTCCG CGCCCTGCGC GACCAGGGGA 
GCGTGCTGTC CGGCCTGGGC CGCGACTGCC 
AGGACGGCTC GGCGGACCCC GACACGACCG 
TGCTGACGGT GCCGCTGCTG CGCCGGGACC 
TGGCCACCGT CCACACCCAC GGCGTCCCCG 
AGGGGCCCGC CGTCGAGCTC CCCACCTACG 
CCCCGGCCCC CGCCGCCGGC CCCACCGCCA 
TCCCGGCCGT CATCGACCTG CCCGACGGGG 
TGCGCACCCA TCCGTGGATC GCCGACCACC 
CGGCCCTGCT GGACGTCGCC GCCTGGGCGG 
AACTGACCTT CGCCACGCCC CTCGTCCTGC 
TGACGGTCAG CGGCCCCGAC GCGGAAGGCA 
CCGACACGGT CCGTACCGCC GATGCCCCCT 
TCGTCCCCGC ACCCGAGGAG GCCGGCGACG 
CCTGGCCCCC GGCCGACGCG ACCCCGGTCG 
ATCAGCGCCT CGCGGCCGGC GGCGTGACGT 
TCTGGCGCCG CGGCGCAGAG GTCTTCGCCG 
ACGCCTCGCG GTACGGCATG CACCCGGCCC 
TCGGCGAGCG GTCCACCGAG GCCCGCGGCC 
TCCACGTCCG CGGCGCCGAC TCCCTGCGCG 
TGACCGTCGC CGCGGTCGAC CCGACGGGCC 
TGCGCCCCCT GGCGGAGAGC CGGTTCCAGG 
GACTGGAGTG GACACCGGCC CCCGGTTCCG 
CAGCGGAGTG GGGCGTCCTC GGCGACCCGG 
GGGCGGAGGC ACCCGTGCGA ACCCATGACG 
CCCCGCCCGA CCATGTGCTG GCGCTGCTGG 
CCCACGACCT GGCCGCACGC GCCCTGGCCC 
TCGCCGACGC GCGACTGGTC GTGCTGACGC 
TCCACCCGGC CGCGGCCGCC GCCTGGGGGC 
GCCGCTTCGT CCTCGTCGAC GCGGACCCCG 
CACGGGCCGT CGCCTCCGGG GCGTCCCAAC 
CCCGGCTCGC CCGGGGAACG GACCGACAGG 
CCGCACCGGA GACGACCGCT GCCCCCGAGC 
GGCCCGCGGA CGGCACCGTG CTCGTCACGG 
CCCGGCACCT CGTGACCAAG CACGGTGTCC 
CGGACACCCC CGGGGCGGCC GACCTGGCCA 
ACATCGTCCG CTGCGATGCC GCCGACCGCT 
CCGCCGCGCA CCCGCTCACC GCCGTCGTGC 
TCACGGCGCA GACGCCCCGA CGCCTCTCGG 
GCCACCTGCA CGAACTGACC CGCGACCTGG 
CCGCCGGAAC CCTCGGCAGC CCCGGCCAGT 
ACGCGTTCGC CGCCTGGCGG CGAGCGCAGG 
TGTGGGGTGA CGGCGGTGAC GGTCGTGACG 
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42061 GCGGTGGCTC GGCGGCCGAC GGCATGGGAG 
42121 TGCGCCGTTC CGGCATCCTC CCGCTCGACC 
42181 CCTGCGACCC CGCCAGGACC GAGGCCGTAC 
42241 TGCGCGCCCG TTCCGCCCGC GGCGCCGTAC 
42301 CCTTGGTGCC CCCGCCCGCC GGTGCCGGAT 
42361 CGGCGGGCCA GGCGCCCCCG GCCCCGGCGT 
42421 AGCCCCGAGG CGAACGGCTC ACCGCCCTCA 
42481 TACTCGGGCA CCCCGACTCC GGCCGCGTCC 
42541 TCGACTCGCT CACCGCCGTC GAACTCCGCA 
42601 TTCCCGCCAC CCTCGTCTTC GACCATCCGA 
42661 AGGAACTGCC GAAGGCAGCA CAGGAGATCC 
42721 TCGACCGGAT CCGGGACGGA CTCGCCACCG 
42781 TCGCGGAGCG TCTCCAGGCG TTGCTCGGCA 
42841 CGACCGGCAG CCCGGGCGAG CACGACCGGC 
42901 GACTCGCCGC CAGCAGCGAC GACGAACTCT 
42961 CGTAGCCCAG GGCCACCCTT CCGCCTCCGC 
43 021 GATGTCTTCC ACATCGCCCG CCACCAACGA 
43 081 CATGACCGAC CTGCACGAGG CCCGCGAGCA 
43141 GCCGATCGCG ATCGTGGGGA TGGGGTGCCG 
43201 GTTGTGGGAT CTGGTGGCGT CGGGGGTGGA 
43261 TTGGGATGTG GGGGGTCTGT TCGATCCGGA 
43 321 TGAGGGCGGG TTCCTTCATG AGGCGGGGGA 
433 81 GCGTGAGGCG TTGGCGATGG ATCCGCAGCA 
43441 GTTGGAGCGG GCGGGCATCG ATCCGCACAC 
43501 CGGAGTGATG TACCACGACT ACGGCAGCAC 
43561 GACCGCCGGA TTCCTCGGCA CGGGGACGTC 
43 621 TGTGCTGGGG CTGGAGGGGC CGGCGGTGAC 
43681 GGCGCTGCAT CTGGCGGTGC GGGCGCTGCG 
43 741 TGGGGTGACG GTGATGGCGG AGCCGGGTGT 
43 801 GGCTGCGGAC GGGCGGTGCA AGGCGTTCGC 
43 861 GGGTGTGGGC GTGCTGGCGG TGGAGCGGTT 
43 921 CCTGGCGGTC GTGCGTGGTT CCGCGGTCAA 

43 981 TCCGAACGGT CCGTCGCAGC AGCGGGTGAT 

44 041 TGTGGCTGAT GTGGATGTGG TGGAGGGGCA 
44101 CGAGGCGCAG GCGTTGTTGG CGACGTATGG 
44161 TGGTTCGCTG AAGTCGAACG TGGGGCATAC 
44221 CAAGATGGTC ATGGCGATGC GGCACGGTGT 
442 81 GACGGCGGAG GTGGACTGGT CGGCGGGCGC 
44 341 GCCTGAGGTG GGGCGTCTGC GTAGGGCTGC 
44401 CGCGCATGTG GTGGTGGAGG AGGCACCTGC 
44461 GGAGCCGGTG GAGCCGGGGG CTGTGGGGCT 
44521 TCGTTCTGCG GGTGCGGTGG CGGAGCTGGC 
44581 TCGGTTGGTG GATGTGGGGT TGTCGTCGGT 
44641 CGTGGTTCTG GCGGAGGACT CTGCCGAGCT 
447 01 GGTGCCGTCG CCTGGCGTGG TGTCGGGTGT 
44761 CGTGTTCCCT GGTCAGGGGA CGCAGTGGGC 
44 821 GGCGGTGTTC GCGGAGTCGA TGGCGCGGTG 
44 881 GCGTCTGGCG GATGTGCTGG GTGACAGGTC 
44 941 GGCGTCGTTC GCGGTGATGG TGTCGCTGGC 
45001 CGATGCGGTG GTGGGGCATT CGCAGGGGGA 
45061 CTCGCTGGAG GACGGCGCGC GTGTGGTGGT 
45121 GGCCGGGCAC GGTGGGATGG CGTCGGTGGC 
45181 GGCGGCGTGG GCGGGGCGTC TGGGTGTGGC 
4 5241 CGCGGGTGAT GTGGACGCGG TGGCGGAGTT 
45301 GGCGCGTGTT CTGCCGGTGG ACTACGCCTC 
45361 CGAGCTTGAA CAGATTCTGG CCGGCATCGA 
45421 CACGGTGGAG GCGGGTGTCG TGGATACGGC 



CGAGCCTGGC CGCCGCCGAC CTGGCACGGC 
CGGCCGAAGC GCTGCGCCTG TTCGACGAGG 
TGCTGCCGAT CCGCCTCGAC CTGACCGGCC 
ACGCGAGCGT GGTGCCCGAA GTGCTGCACA 
CCCCGGCCGG TGCCGACGCG TCGGATCCCG 
CCGACACCCT GGCCGAACGG CTCGCCGGGA 
CCGAACTGGT ACGCACCGAG a TceiccTcnn, 
AGCTCCAGTC CTCCTTCAAG GAGTCCGGCT 
ACCGGCTCAC CGCGGCCACC GGAACGAAGC 
CACCCGCGGC ACTCGTCGAC CACCTGGAAC 
CGGCGGACCT CCCGGCCGTT CTCGACGCAC 
CCGCCACCGA CGACAGCAGC CGCGACCACA 
CGCTCACCTC GGCTGCGGGC GTCAGCCGCC 
AGGGCCCCGA TGAGCTGTCG CTCGGCCAAC 
TCGACCTCTT CGACAGCGAC TTCCGATCGA 
CGCCCGCCCC ACACCCCTGG AGAACAGCAC 
AGAGAAGCTC CGCGACTACC TCCGCCGCGC 
GATCCGCCGG ACCGAGTCGG CCAGGCACGA 
TCTGCCGGGT GGGGTGTCGT CGCCGGAGGG 
TGCGGTTTCT CCGTTCCCCA CGGATCGGGG 
GCCGGGGGTG CCGGGGCGTT CGTATGTGCG 
GTTCGACGCG GGGTTCTTCG GTATCTCTCC 
GCGGTTGCTG TTGGAGACGT CGTGGGAGGC 
GCTCCGCGGC TCACGGACCG GCGTCTACGC 
CGCGACCGTC TCCGTCGCCT CCGACGACGA 
CGGGAGTGTG GCTTCGGGTC GTATCTCGTA 
GGTGGACACG GCGTGTTCGT CGTCGTTGGT 
GAGTGGTGAG TGTGACCTTG CTCTTGCGGG 
GTTCGTGGAG TTCTCGCGGC AGCGGGGGTT 
GGCTGCTGCC GATGGCACGG GCTGGGCCGA 
GTCGGATGCG GTGCGTCACG GGCGCCGGGT 
CCAGGACGGT GCGAGCAATG GGTTGACGGC 
TCGTCAGGCG TTGGCGGATG CGCGGTTGGG 
TGGGACGGGG ACGCGTCTGG GTGATCCGAT 
GCAGCGGGAT GCGGGTCGGG CTTTGCGGCT 
GCAGGCGGCT GCCGGTGTGG CGGGCGTGAT 
CCTGCCGAAG ACGCTGCACG TGGATGAGCC 
GGTGTCCCTG CTGAGGGAGC AGGAGGCGTG 
GGTGTCTTCG TTCGGTGTGA GTGGGACGAA 
TTCGGAGGCG CCGGTCGCGG GGGAGCCGGT 
TCTTCCGGTG GTGCCGGTGG TGGTGTCGGG 
CTCCCGCTTG AACGAGTCGG TTCGTTGGGA 
GGTGTCGCGG TCGGTGTTCG AGCACCGGTC 
GCATACCGGT CTGGTTGCTG TCGGGACTGG 
GGCGTCGGTC GAGGGTGGCC GGTCGGTGTT 
GGGGATGGCG CTCGGGTTGT GGGCGGAGTC 
TGAGGCGGCG TTCGCCGGGT TGGTGGACTG 
TGCGTTGGAG CGGGTCGATG TGGTGCAGCC 
CGAGCTGTGG CGGTCGCTGG GTGTGGTGCC 
GATCGCTGGT GCGGTGGTGG CGGGTGGTCT 
GTTGCGTGCG CGGTTGATAG GCCGTGAGCT 
GCTGCCGGTC GCGGTGGTGG AGGAGCGTCT 
GGTGGTCAAC GCACCCTCCG CCACGGTCGT 
TGTGACCGCG TGCGAGGTGG AGGGGGTTCG 
GCACTCGGCG CACGTGGAGG AGCTGAGGGC 
CCCGGTGGCC GGTGAGACCC CCCTGTACTC 
GTCGATGGAT GCGGGGTACT GGTTCAGGAA 
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45481 TCTGCGTCGG CCGGTTCGTT TCCAGGAGAC GGTCGAGCGG TTGCTGGCGG ATGGTTTCCG 
4 5541 GGTGTTCGTG GAGTGCGGCG CGCATCCGGT GCTGACGGGG GCGGTGCAGG AGACCGCGGA 
45601 ATCCACCGGT CGCCAGGTGT GTGCGGTCGG ATCCCTGCGT CGTGACGAGG GTGGTCTGCG 
4 5661 ACGCTTCCTG ACCTCTGCGG CGGAGGCGTT CGTCCAGGGG GTGGAGGTGT CCTGGCCGGT 
45721 GCTGTTCGAT GGCACCGGCG CCCGCACGGT CGACCTGCCC ACCTACCCCT TCCAGCGTCG 
45781 GCGTTACTGG CTGGAGTCAC GTCCTCCTGC GGCGGTTGTT CCGTCGGGGG TCCAGGACGG 
45841 ATTGTCGTAT GAGGTGUTUT GGAAGAGCCT GCCGGTACGG GAGTCGTCGC CTCTTGACGG 
45901 CCGGTGGCTG CTCGTCGTGC CCGAAACCCT GGACGCCGAC GGCACGCGGA TCGCCCACGA 
45961 CCTCCAGCAC GCCCTCACCA CCCACGGCGC CACCGTGCAC ACTCTTGCTC TTGACCCCAG 
46021 CGCGGCGCAC TTCGACGGTC TCTTTGACGG GATACTCCAG GAAGAAACAG ATGTCACGGG 
46081 CATCTTCTCT CTCCTCGGAC TGGCATCGGG CCCGCACCCG GATCACGGCG AGGTGGAGCT 
46141 CGCGGGAGCC GCGTCGCTGA CGTTGATGCG CCAAGCCCAG CGAGACGGCT TCCGTGCTCC 
46201 GGTGTGGGCG GTGACGCGGG GTGCGGTGTC CGTGGTGCCT GGTGAGGTGC CGGAGACCGC 
46261 GGGTGCGCAA CTGTGGGCGC TCGGCCGGGT CGCCGGTCTC GAACTCCCCG ACCGTTGGGG 
46321 TGGTCTGATC GATCTCCCGG CGGATGCCGA TGCGCGTACG GCGGGGCTTG CGGTGCGGGC 
46381 CCTGGCCGCC GGGATCGCCG ATGGTGAGGA CCAGCTGGCG GTGCGCCCCT CAGGTGCCTA 
46441 CGGCCGGCGC CTCGTACGAG CCACCGCGCG CCGGGGACGG AAGGACTGGC GCCCGCAGGG 
46501 TACGGTGCTG CTCGCCGGGC ACCTCGACGC CGTCGGTGAA CCACTGGCCC GATGGCTGCT 
46561 CACCGGCGGC GCGGACCACG TCGTCCTTGC GGATCCCGCC CTGACCGAAC TCCCGGCCAC 
46621 CCTCGCGGAT CTGGCCCAGA CCGTGACGAC CGCTGCGGCA CCCGACCTTG CCGACCGTGC 
46681 AGTCCTCGCC GCCCTGGTCA CCGAGTACGT ACCCGCCACC GTGGTCGTCG TTCCGCCCGC 
46741 GGCGGAGCTC GCTCCGCTGG CGAGTATCAG CCCGGCCGAC CTCGCGGCGG CCGTCACCGC 
46801 CAAGTCCGCG ACCGCGGCGC ACTTCGACGC GCTGCTCGAC GGACCCCACG CACCGGAGCT 
46861 GGTGCTGATC TCCTCGGTCG CGGGGATCTG GGGTGGTGTC CGGCAGGGTG CGTACGCCGT 
46921 CGGTGCCGCT CACCTCGATG CCCTGGCCGC CCGCCGCAGG GCCCGCGGTC TGTCGGCGGC 
46981 CTCCGTCGCG TGGACGCCCT GGGCGGGTTC CGTCACCGCG GACGGCTCCG CCGCCGAGTC 
47041 GCTGCGGCAG TACGGCATCG CTCCGCTGGA GCCGCAGGCG GCGCTCGCGG AGCTGGACCG 
47101 GGCGCTGAAC CAGCAGCTGC ACGGCGGCGG GGGCGACGCG GCGGTGGCCG ACATCGACTG 
4 7161 GGAGCGGTTC CTCGCGTCGT TCACCTCCGT ACGTCCCAGC GTTCTCTTCG ACGAGCTGCC 
47221 CGAGGTACGC CGTCTCCGCG AGGCGGAGGC GGCGGCCATG GCGGACCAGG CCGCCGCCCG 
47281 GACGGGAGCG CCCGGCGGAA CGGAGCTGGC GCGCTCTCTG CGGGCCAAGT CCCTGAACGC 
4 7341 CCAGCGAACT GCGCTCCTGG AATTGGTCAC TGCCCACGTG GCGGCCGTGC TGGGAGAGAG 
47401 CGTTCCCGAG GCGATCGACC GGAGCCGGGC GTTCAAGGAC ATCGGCTTCA CCTCCATGAC 
47461 CGCGATGGAA CTGCGCAACC GGCTCAAGGA GGCCACCGGG CTCGCCCTTC CTGCCTCCCT 
47521 CGTCTTCGAC CACCCCCACC CCGGCGCACT CGCCGACCAC CTGCGCGAGG AACTCCTGGG 
47581 CGAGGACGGT GCGGCGGGCG CCGACTCCGC GGCGGAGGAA CCGAGCGCTA CCTCTCCGAC 
47641 GGTCCAGGAC GAGCCGATCG CCATCATCGG CATGGCCTGC CGCCTCCCTG GTGACGTCGG 
47701 AACACCCGAC GAACTCTGGG AGCTGCTGGA AACCGGCCGC GACGCGATGT CGGACCTGCC 
47761 CGTCAACCGC GGGTGGGACG TGGCGGGGCT CTACGACCCG GATCCAGACG CGGCGGGGCG 
47821 TTCCTACGTC CGGGAGGGCG GGTTCCTCCA CGACGCGGGG GAGTTCGACG CGGAGTTCTT 
47881 CGGCATCTCG CCGCGTGAGG CGCTGGCGAT GGACCCGCAG CAGCGCATCG TCCTCGAACT 
47941 CGCCTGGGAA TCGTTCGAAC GTGCGGGCCT GGACCCGGCC GGCCGCCGCG GCAGCCGTAC 
4 8001 CGGCGTGTTC ATGGGAACCA ACGGCCAGCA CTACATGCCG CTGCTGCAGA ACGGCAACGA 
48061 CAGCTTCGAC GGCTACCTCG GCACCGGGAA CTCGGCCAGT GTCATGTCGG GCCGCATCTC 
4 8121 GTACACCCTC GGCTTGGAGG GGCCGGCGCT GACGGTGGAC ACGGCGTGTT CGTCGTCGCT 
48181 GGTCGCGCTG CATCTGGCGG TGCGGGCGCT GCGCAACGGT GAGTGCGACC TCGCCCTCGC 
48241 CGGCGGCGCG ACCGTGATGT CGACGCCGGA AGTCCTGGTG GAGTTCTCCC GGCAGCGTGC 
48301 AGTCTCCGCA GACGGTCGCT GCAAGGCGTT CTCCGCCTCG GCCGACGGCT TCGGACCGGC 
48361 CGAGGGTGCG GGCGTGTTGC TCGTCGAGCG GTTGTCGGAT GCGGTGCGTC ATGGGCGTCG 
48421 GGTGTTGGCG GTCGTGCGTG GTTCGGCGGT GAATCAGGAC GGTGCGAGTA ATGGGTTGAC 
48481 GGCTCCGAAC GGTCCGTCGC AGCAGCGGGT GATTCGTCAG GCGTTGGCGG ATGCGCGGTT 
48541 GGGTGTGGCT GATGTGGATG TGGTGGAGGG GCATGGGACG GGGACGCGTC TGGGTGATCC 
4 8601 GATCGAGGCG CAGGCGTTGT TGGCGACGTA TGGGCAGCGG GATGCGGGTC GGCCGTTGCG 
4 8661 GCTTGGTTCG TTGAAGTCGA ACGTGGGGCA TACGCAGGCG GCTGCCGGTG TGGCGGGCGT 
48721 GATCAAGATG GTCATGGCGA TGCGGCACGG TGTCCTGCCG AAGACGCTGC ACGTCGATGA 
4 8781 GGTCTCTCCG CACGTCGACT GGTCGGCGGG TGCGGTGTCC CTGCTGACGG AGCAGGAGCC 
48841 GTGGCCGGAG GTGGGGCGCC CTCGCAGGGC TGCGGTCTCT TCGTTCGGGC TCAGCGGGAC 
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4 8901 GAATGCGCAT GTGGTGGTCG AGGAGGCGCC 
4 8961 TGCTCGGTTG GCTGTGGTGC CGGTGGTGGT 
4 9021 ACTGGCCTCC CGCTTGAACG AGTCGATTCG 
4 9081 GTCGGTGGTG TCGCGGTCGG TGTTCGAGCA 
4 9141 CGAGCTGCAT ACCGGTCTGG TTGCTGTCGG 
49201 GGGTGTGGCG TCGGTCGGGG GTGGCCGGTC 
4 9261 GTGGGCGGGG ATGGCGCTCG GGTTGTUtiGC 
4 9321 GCGGTGTGAG GCGGCGTTCG AGGGGTTGGT 
4 9381 CGGGTCCGCG TTGGAGCGGG TCGATGTGGT 
4 9441 GCTTGCTGAG CTGTGGCGGT CGTTGGGTGT 
4 9501 GGGGGAGATC GCTGCTGCGG TGGTGGCGGG 
4 9561 GGTGGTGTTG CGTGCGCGGT TGATCGGCCG 
4 9621 GGTGGCGCTG CCGGTCGCGG TGGTGGAGGA 
4 9681 TGTGGCGGTG GTCAACGGAC CGTCCGCCAC 
4 9741 GGAGTTTGTG ACCGCGTGCG AGGTGGAGGG 
49801 CGCCTCGCAC TCGGCGCACG TGGAGGACCT 
49861 CATCGGCCCG GTGACCGGTG GGATCCCGTT 
49921 CACGGCTGGT CTGGACGCGG GGTACTGGTT 
49981 GGAGACGGTC GAGCGGTTGT TGGCGGATGG 
50041 TCCGGTGCTG ACGGGGGCGG TGCAGGAGAC 
50101 GGTCGGATCC CTGCGTCGTG ACGAGGGAGG 
50161 GGCGTTCGTC CAGGGGGTCG GGGTGTTCTG 
50221 TATCGTCGAC CTGCCCACCT ACCCCTTCCA 
50281 CCGCCGCACG GGCGATGCCA CCTCCTTCGG 
50341 TGCCGGTACG GAGCTCCCTG AATCAGGCGA 
50401 CTCGCATCCT TGGCTGCTGG AACACACCCT 
50461 GTTCGTCGAC CTCGTCCTGT GGGCCGGCGG 
50521 GACGCTGACC TCGCCGCTGC TGTTGTCCGA 
50581 GGGCACGGCG GACGCCGAGG GACGTCGTAC 
50641 CCCGCGTACG ACGCGCACAC CTGCGGCATC 
50701 CACGGAGATC CGTAGGGACA CGTCCGCCTG 
50761 CGCCCCTGAC GTCCCCCCTT CCGGGGTGGA 
50821 GGAATGGAGC GTGGCGGCGA CGGAGTCGGA 
50881 CTTCGCCGCA CACGGCTATG GCTACGGCCC 
50941 GGACGGGACC GACGTCTACG CCGAAGTCGC 
51001 GCAGTTCGGC CTGCACCCCG CGCTGCTCGA 
51061 GTTCTTCCCC GACGACGGAC AGGCACGTGT 
51121 CGCCCCGGGA GCCGCGCGCC TGCGGGTCCG 
51181 CGTGGAGTGC GCCGATGAGC GGGGGCGGCT 
51241 CACGGTCTCC CCGGACCAGT TGCGGCCGGC 
51301 CCGGATCGAG TGGCCCGTCC TCTCCCCGCC 
51361 CCGCTGGATC GTGGTCGGGG GCGAGGACGA 
51421 CGGTCCGAGG CTTGACGGTC CCGGGCTTGC 
514 81 CGAGCGTCAC CGGAACCTGG CCGACGCGCT 
51541 AGGCTCCGCT GCCGCCGCCG GCACGACCTC 
51601 CACCATGGAC GCCGGTGCCG TGCGCCACGC 
51661 CTGGGTGGCG GCGGACGAGG CGGCGGAAGA 
51721 GCGGCTGGTG CTGGTCACGA GCGGAGCGGT 
51781 CCCGGTGGCC GCGGCCGTCT GGGGTCTGAT 
51841 CATCGTCCTC GTTGACCTCG ACGAGGGAGC 
51901 CTCGACCGGC GAACCACAAC TCGCCCTGCG 
51961 ACCCCTGTCC GTGCGGGACT CGCAGACGCT 
52 021 TCTGGTCGGC GCCGGCACCG GAACCCTTTC 
52 081 CACCGTCGCG CTCGCACCTG GGCAGGTGCG 
52141 CCGGGACACG CTCATCGCGC TCGGTATGTA 
52201 CGCCGGAGTG ATCACCGAGG TCGGCCCGGA 
52261 CCTGGGCATG TGGACCGACG GGTTCGGGCC 



GGTCGGGGAG GCGGGGCAGG CCGCCGGGGA 
GTCGGGCCGG TCTGCGGGTG CGGTTGCTGA 
TTCGGATCGG TTGGTGGATG TGGGGTTGTC 
CCGGTCCGTG CTACTGGCGG GGGACTCTGG 
GACTGGTGTG CCGTCGCCTG GTGTGGTGTC 
GGTGTTCGTG TTCCCTGGTC AGGGGACGCA 
GGAGTCGTCG GTGTTCGCGG ACTCGATGGC 
GGACTGGAGT CTGGCGGATG TGCTGGGTGA 
GCAGCCGGCG TCGTTCGCGG TGATGGTGTC 
GGTGCCGGAT GCGGTGGTGG GGCATTCGCA 
TGGTCTGTCG TTGGAAGACG GGGCGCGTGT 
TGAGCTGGCC GGGCGCGGTG GGATGGCGTC 
GCGTCTGGCG GGGTGGGCGG GGCGTCTGGG 
GGTCGTCGCG GGTGATGTGG ATGCGGTGGC 
GGTTCGGGCG CGTGTTCTGC CGGTGGACTA 
GAAGGCCGAG CTTGAAGAGG TGCTGGCCGG 
CTATTCGACG TCCGAAGCCG CGCAGATCGA 
CGGGAATCTG CGTCGGCCGG TGCGGTTCCA 
TTTCCGGGTG TTCGTGGAGT GTGGTGCGCA 
CGCGGAATCC ACCGGTCGCC AGGTGTGTGC 
TCTGCGCCGC TTCCTCACCT CTGCGGCGGA 
GCCGGCACTG TTCGACGGCA CCGGCGCCCG 
ACGACGGCAC TACTGGTACA ACGACCCTGC 
TATGGCGCAG GCCGGTCATC CCTTGCTCGA 
GCACCTCTAC ACCGCTCGGC TCGCCGCCGA 
GCTGGGTGCG CCGTTGCTGC CCGGTGCGGC 
GGAGGTCGGA TGCGACCTGA TCGAAGAGCT 
CAGCGCTGCC CTTCAACTGC GGCTGGTCGT 
GATCACCGTC CACTCGCGGC CGGACGGAGA 
GTCCGAGACC AGCCCGGACG CGGAGTCGGA 
GACGAAGCAC GCTCAGGCGA CGGTCGCCCC 
CGCGGAAGGG GACGCCGTTC GCCCCGCAGT 
TGCCTTCCAG GCCGAGGACT TCTACGCGTC 
GCTGTTCCAG GGCGTACGGT CAGGCCGTCA 
CCTGGATCAC GACCGCTTGC CGTCTGCCGA 
CGCGGCGTTC CAGACGATGC GTCTGGGATC 
GCCGTACACC TTCCGGGGGA TTCGTCTCTA 
TGTCTCGGCG GTCGGGGCCG ATGCCGTACG 
CGTCTGTGAG ATCGACGCCC TCGTCGTCAG 
CGGACAGGAC GCGACCCAGG ACATGCTGCA 
GACCGGCAGC GCCACCTCCC CTGCTCCGCC 
GGGCCTCGGG CTCGGGGGCC TTCGACTCGA 
GGAAGCGCTG TCCGAAGCCG GTATGGGGAC 
GTCGGCCGTA CGGACGCCGG TGGACACGGC 
CCTCATAGCC GTCCCCGTAC CGCAGTCGCC 
CGTCCACCGA GCCCTGGAGC TGGTGCAGGG 
GGGCGGGAGC GACGGTGCCG CGGCCGACCG 
GTCCACGGGT GACGCCGACC CGCTGCGCGA 
CAAGTCCGCC CAGTCGGAGC AGCCCGGCCG 
CGTGGACGGG GCGGCCTTGG CAGCCGCGAT 
CGACGGCGAT GTGCACGTGC CCAGGCTGGC 
GCTGCCGCCC GCCGGTACGC GCGCCTGGCA 
GGACCTCGCG CTCGTACCGG CGCAGACCGA 
GATCGCGGTG CGAGCCGCCG GACTCAACTT 
TCCGGGCGAG GGCGTGATGG GCGCCGAGGG 
CGTGGTGAGC CTCGCCGTCG GGGACCGCGT 
GTACGTCGTG GCCGACCACC GCATGGTGGC 
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52321 CCCGATGCCG CGCGACTGGT CCTACGCGGA 
52381 TGCCTACTAC GGACTCAGGC ACCTGGCCGG 
52441 CGCGGCGGCG GGCGGTGTGG GCATGGCTGC 
52501 GGTCTTCGGC ACGGCCGGCA CGGCCAAATG 
52561 CCGGCACATC GCCGGTTCAC GGACGCTGGA 
52621 GGGGCGCGGC GTGGACGTCG TCCTGAACTC 
52681 GCGACTGCTG CCGL'GCGGAG GGCGGTTCGT 
52741 CGCGCAGGTC GCCGCCGACC GGCCGGGAAC 
52801 CGGGCCGGAG CTGATCGGCC GCATGCTGAA 
52861 GCTGCGCCTG CTGCCCGTCA CCCCGTACGA 
52921 GCTCAGCCAG GCCGGTCACG TCGGCAAACT 

52 981 CCACGGCACG GTCCTGATCA CCGGCGGCAC 
53041 TCTCGTGACC GAACACGGAG TGCGCCACCT 
53101 CGAAGGCGCC GCCGAACTCG TACGGGAACT 
53161 CGCCTGTGAC GTGGCCGACC GAGCGGCGCT 
53221 GCGCCCGCTC ACCGGAGTCG TCCACGCGGC 
532 81 ATTGACCCCG GACAGGGTCG ACGGCGTCCT 
53341 CCACGAAGCG GCCCTCGATC CCGAACTCGG 
534 01 GTCCGTCGCT GCCCTGCTCG GCGGCTCGGG 
53461 CCTCGACGGA CTCGCCCAGT ACCGGCGTGG 
53521 GGGCCTGGCC GGGAGCGGCC GGATGACATC 
53581 CATGGCCAGG GGCGGTGTCC TGCCGCTGTC 
53641 GGCCCAGGGC TTCGACGAGG CGCTCCAGGT 
53701 CGCCGACGGC AACGTCCCGC CGCTCTTCAA 
53761 CGAGGCCCGG CGCAGGACGG TCGGCGCGTC 

53 821 GGTGAACCTC GCCGACCGGC TGTCCGGACT 
53881 CGACACGGTG CGCACGCACG CGGCTCTCGT 
53941 GGCCGACCGG GCGTTCAAGG ACCTGGGATT 

54 001 CCGGCTCACC GCCGCCACGG GGCTGCACCT 
54 061 TCCGGCGGAC CTCGCCGAGC ACCTCCGCTC 
54121 GCCGCTCCTC GCGGAACTCG GTCGGCTCGA 
54181 CCTCGCCTCG GTCGTGCCCG ACGACATCGC 
54241 CCTCGGTTCC CTGTGGAACG GGCTCCATGG 
54301 GCACGGCGAC TCGATCGTCG AGGACATCGA 
54361 CCTCGACGAG AGCTTCGGCG ACTCCTGACC 
54421 GCAGACGGGT GATGAGAGAT GGCGACCGAA 
54481 CGGGCCACCA CCGAACTCCA CAAGGCCACG 
54541 CACGAGCCGG TTGCGATCGT GGGGATGGGA 
54601 GAGGAGTTGT GGGACCTGGT GGCGGCGGAG 
54661 CGGGGGTGGG ACGTGACGGG GCTGTACGAC 
54721 GTCCGCGAGG GCGGGTTCCT CCACGACGCG 
54781 TCTCCGCGTG AGGCGTTGGC GATGGATCCG 
54841 GAGGCGTTGG AGCGGGCGGG CATCGATCCG 
54901 TACATGGGTG CCTGGAACGG CGGATACGCC 
54 961 GAGGCCCAGC TCCTCACCGG CGGCGTGGTG 
55021 CTGGGTCTGG AGGGGCCGGC GGTGACGGTG 
55081 CTGCACCTGG CGGTGCGGGC GCTGCGCAGT 
55141 GCGACGGTGA TGTCGACGCC CGACGTGTTC 
55201 GCGGACGGTC GCTGCAAGGC GTTCTCCGCG 
55261 GTGGGCGTGC TGGCGGTGGA GCGGTTGTCG 
55321 GCGGTCGTGC GTGGTTCCGC GGTCAACCAG 
55381 AGCGGACGAG CTCAGGCCCT TCTGATTCGT 
55441 GCTGATGTGG ATGTGGTGGA GGGGCATGGG 
55501 GCGCAGGCGT TGTTGGCGAC GTATGGGCAG 
55561 TCGTTGAAGT CGAATGTGGG GCATACGCAG 
55621 ATGGTCATGG CGATGCGGCA CGGTGTCCTG 
55681 GCGGAGGTGG ACTGGTCGGC CGGCGCGGTG 



GGCCGCTTCG GTACCCGCCG TCTTCCTCAG 
TCTGCGCGCC GGCCAGTCGG TGCTGGTGCA 
CGTCCAACTC GCCCGGCACT TCGGGGCCGA 
GGACGCACTG CGGGCACAGG GCCTGGACGA 
CTTCGCGGAC CGGTTTCTCG ACGCGACCGA 
GCTGGCGGGC GACTTCGTCG ACGCCTCCCT 
GGAACTGGGC AAGGCGGACG TACGCGACGC 
CGTCTACCGG GCCTTCGAGC TGATGGAGGC 
CGAACTGCTG GAACTGTTCG AGTCCGGGGC 
CATCCGGCGG GCACCCGACG CCTTCCGCAC 
GGTCCTGACG ATGCCACCGG CCTTCGAACC 
CGGGAACCTG GGCGGAACAC TCGCCCGCCA 
GCTCCTGGCC GGACGTAGGG GGCCCGAGGC 
GCACGACCTG GGCGCCTCCG TCACGGTCGC 
CCGGAAACTC CTCGGCGGCA TACCGCCGGA 
GGGCGTTCTC GACGACGGCG TGGTCACGTC 
GCGGCCCAAG GTGGATGCCG CTCTCAACCT 
TCTCGACATC ACCGCGTTCG TCCTGTTCTC 
TCAGGGAAGC TACGCCGCCG CCAACGGCTT 
CCGCTCGCTG CCCGCGCTCT CCCTCGGCTG 
CCACCTGGAC AGCCGGGCCC TGCTCCGGCG 
CCCGGCGGAG AGCATGGCAC TGTTCGACGC 
GCCGGCGCGC TTCCACACCG CCGCACTGGG 
CGGACTGATC CGGGGCGGGA CGGCGCATGC 
GCCCGCGGGT GGTCCTGCCG GAGGCGAGCC 
GACGGAGGAC GAACAGCGGG CGCTGCTCCT 
CCTGGGCCAC ACGGGCACGG ACGGCATCCA 
CGACTCGCTG ACGGCCGTCG AGATGCGCAA 
CGCGGCGACG CTGGTCTTCG ACCACCCCGC 
TCGACTCGTC CCCGAGGGGA CGGACGTACC 
AACGGCGTTC AAGAAGCTGA CCACCGCGGA 
CCGCGACGAG ATCGCCGTAC GTCTCGCCGC 
CAACGGCCTC AGCGGAGACG CGGCGCAGAA 
CTCCGCCGAC GACGACGAGA TCTTCGCCTT 
GCAGGCACCT CCGTACGGAC CGACGACTCT 
CACGAGCAGA AGCTCCGCGA CTACCTCAAG 
GAACGGCTGA AGGAGGTCGA ACAACGCGCT 
TGCCGGTTCC CGGGCGGGGC GTCCTCGCCT 
ACGGACGCGG TCTCCCCCTT CCCGGTGGAC 
CCGGATCCGG ACGCGGCAGG GCGTGCCTAC 
GGGGAGTTCG ACGCGGGGTT CTTCGGAATC 
CAGCAGCGGT TGCTGCTGGA GACGTCGTGG 
CACACGCTGC GCGGCACGCG GACCGGGGTC 
GAGGGGATTC CCCAACCCAC GGCGGAACTG 
AGCTTCACCT CGGGCCGTGT GTCCTACCTC 
GACACGGCGT GTTCGTCGTC GCTGGTCGCG 
GGTGAGTGCG ACCTCGCCCT CGCCGGCGGC 
GTGCGCTTCT CCCGGCAGCG AGGAGTGGCC 
TCGGCCGACG GATTCGGACC GGCTGAGGGT 
GATGCGGTGC GTCATGGGCG TCGGGTGCTG 
GACGGTGCGA GCAACGGACT GACGGCGCCG 
CGAGCGTTGG CGGATGCGCG GTTGGGTGTG 
ACGGGGACGC GTCTGGGTGA TCCGATCGAG 
CGGGATGCGG GTCGGCCGTT GCGGCTTGGT 
GCGGCTGCCG GTGTGGCGGG CGTGATCAAG 
CCGAAGACGC TGCACGTGGA TGAGCCGACG 
TCTTTGCTGA GGGAGCAGGA GGCGTGGCCT 
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55741 GAGGTGGGGC GTCTGCGTAG GGCTGCGGTG TCTTCGTTCG GTGTGAGTGG GACGAACGCG 

55801 CATGTGGTGG TGGAGGAGGC GCCGGTTCCG GAGGACGGGG AGGCGGTCGG GGGCGGTGTG 

55861 CCTTTGGCTG TGGTGCCGGT GGTGGTGTCG GGTCGTTCTG CGGGTGCGGT GGCGGAGCTG 

55921 GCGGGCCGGG TCAGCGAGGT GGCTGCGTCT GGTCGGTTGG TGGATGTGGG GTTGTCGTCG 

55981 GTGGTGTCGC GGTCGGTGTT CGAGCACCGG TCCGTGGTAC TGGCGGGGGA CTCTGCCGAG 

56041 CTGAATGCCG GTTTGGATGC TGTGGCCGGT GGTGTGCCGT CGCCTGGTGT GGTGTCGGGT 

56101 GTGGCGTCGG GTGAGGGTGG CCUGTCGGTG TTCGTGTTCC CTGGTCAGCG GACGCAGTGG 

56161 GCGGGGATGG CGCTCGGGTT GTGGGCGGAG TCGTCGGTGT TCGCGGAGTC GATGGCGCGG 

56221 TGTGAGGCGG CGTTCGTCGG CTTGGTGGAC TGGCGCTTGT CGCAGGTTTT GAGCGATGGG 

562 81 TCGGCGCTGG AGCGGGTGGA GGTGGTGCAG CCGGCGTCGT TCGCGGTGAT GGTGTCGCTT 

56341 GCTGAGCTGT GGCGGTCGTT GGGTGTGGTG CCGGATGCGG TGGTGGGGCA TTCGCAGGGG 

56401 GAGATCGCTG CTGCGGTGGT GGCGGGTGGT TTGTCGCTGG AGGACGGGGC GCGTGTGGTG 

56461 GTGTTGCGTG CGCGGTTGAT CGGTCGTGAG CTGGCCGGGC GCGGTGGGAT GGCGTCGGTG 

56521 GCGCTGCCGG TCGCGGTGGT GGAGGAGCGT CTGGCGGGGT GGGCGGGGCG TCTGGGTGTG 

56581 GCGGTGGTCA ACGGACCGTC CGCCACGGTC GTCGCGGGTG ATGTGGATGC GGTGGCGGAG 

56641 TTCGTGGCCG CGTGCGAGGT GGAGGGGGTT CGGGCGCGTG TTCTGCCGGT GGACTACGCC 

56701 TCGCACTCGG CGCACGTGGA GGACCTGAAA GCCGAGCTTG AACAGATTCT GGCCGGCATC 

56761 GGCCCGGTGA CCGGTGGGAT CCCGTTCTAT TCGACGTCCG AAGCCGCGCA GATCGACACG 

56821 GCTGGTCTGG ACGCGGGGTA CTGGTTCGGG AATCTGCGTC GGCCGGTGCG GTTCCAGGAG 

56881 ACGGTCGAGC GGTTGTTGGC GGATGGTTTC CGGGTGTTCG TGGAGTGTGG CGCGCATCCG 

56941 GTGCTGACGG GGGCGGTGCA GGAGACCGCG GAATCCACCG GTCGCCAGGT GTGTGCGGTC 

57001 GGATCCCTGC GTCGTGACGA GGGAGGTCTG CGCCGCTTCC TCACCTCGGC CGCGGAGGCA 

57061 TTCGTCCAAG GCGTGGAGGT GTCCTGGCCG GCACTGTTCG AAGGCACCGG CGCCCGCACG 

57121 GTCGACCTGC CCACCTACCC CTTCCAACGT CGGCGCTACT GGCTGGAGTC GCGCCCTCCC 

57181 GCGGCGCCGA TCGAGACTGC CGCAGCCTCT GGCATCGAGA GCTGGCGCTA CCGCGTGGCG 

57241 TGGAAGAGCC TGTCGCTGTC GGAGTCGTCG CGTCTTGACG GCCGGTGGCT GCTCGTCGTG 

57301 CCCGAAACCC TGGACGCCGA CGGCACGCGG ATCGCCCACG ACATCCAGCA CGCCCTCACC 

57361 ACCCACGGCG CCACGGTCTC CCGTCTGACG GTCGACGTGA CGACGACCGA CCGCGCCGAC 

57421 CTGTCGGCGC GGCTCACCAC CACCGCGGCC GAAGACCAGG GGCCTCTCCG GGGCGTCCTC 

57481 TCCCTCCTGT CCACCGATGA ACGGCAGCAC CCGGATCATC CCGGTGTCGA CCGTGCCACG 

57541 GCGGGCACGA TGCTGCTCGC CCAGGCGTGC GGGGATCTGG TCGTGGCCCG GGGCGTGGAG 

5 7601 CCGAGGCTGT GGGTCGTGAC CCGCGGGGCG GTCGCGGTGT CCCCCGCCGA GCGTCCGTCG 

57661 TCAGCCGGCG CCCAGGTCTG GGGCCTGGGG CGCTGCGCGG CGCTCGAACT TCCCACTCGG 

57721 TGGGGTGGGA TGGTCGACCT TCCCCCGGCG GCCCGGGATG CTGGAAGGCA CGTACGGCGG 

57781 CTCGTGCGTC TGCTGTCGGA GACCTGTGCG GAGGACCAGG TGGCGCTGCG TGCGTCGGGT 

57841 GCGTACGGCC GCAGGCTGCT GCCCGCGTCC AGCCCCTCCG TATCCGTCCC CCGGACCGCG 

57901 AAGAGCGGCT ACCAGCCGCG CGGCACGGTG CTGGTGACCG GCGGAACCGG TGCCCTCGGT 

57961 GGCCACTTGG CACGGTGGCT GGCCCGCAAC GGCGCCGAGC ACATCGTTCT GGCCGGGCGT 

58021 CGGGGCGAGG GTGCTCCAGG AGCCGCGGAA CTGTCGGCGG AGCTCAAGGA GCTGGGTGCG 

58081 GAGGTCACCG TCGCGGCCTG CGACGTGGCG GACCGGAACG CGTTGCGTGA CATGCTGGAA 

58141 TCCCTGCCGG CCGACCGGCC GCTGTCGGGG GTGTTCCACG CTGCCGGTGT CCCGCACTCG 

58201 GCGCCGCTGG CCGAGACGGA TGTGGCGGGG CTCGCCGCCG TGCTCCCGGG GAAGGTCGTC 

58261 GGGGCACGGC ACCTGCACGA ACTCACCAGG GAGAAGGAAC TGGACGCGTT CGTGCTGTAC 

58321 GCGTCGGGCG CCGGGGTGTG GGGGAGCGGC GGGCAGAGCG CGTACGGAGC CGCCAACGCC 

58381 GCACTGGACG CGCTGGCCGA ACAGCGCCGG GCTGAGGGAC TGCCCGCCAC TTCGGTCTCC 

58441 TGGGGCCTGT GGGACGGCGG AGGCATGGCC GGCGAGCGAG GCGAGGAGTT CCTCACCGCC 

58501 CTCGGCCTGC GGGCCATGGA GCCCGAGTCG GCTGTCGCCG CCCTGGAGGA GGCCCTGGAT 

58561 CGTGGGGACA CCTGCGTGAG CGTGGTCGAC GTCGACTGGT CCCGGTTCGC CGAGTCGTTC 

58621 ACCGCCTTCC GGCCCAGCCC GCTGATCGGG GAGCTCCCCG GGGTACGTGC CGTGCCCGAC 

58681 GGATCGGCGG GCGGACCGTC GGACGACCTC GCGGACGCTG CGCGGCACGG CGGGGCAGCC 

58741 GACCGGGGTG TGCCTGCAGG GCTCGCCCGG GCGACGGGCG ACGACCGGCA GGACATCCTG 

58801 CTCGATCTCG TACGCCGCCA TGCCGCCGCC GTCCTCGGTC ACCCGGGACC GCAGCACATC 

58861 GAGCCCGACG CCGGTTTCCG GACCCTGGGG TTCAGTTCGG TCACCGCGGT GGAACTGGCC 

58921 AACAAGCTCG GTGCGGCCGT GGGAACGAAG ATCCCCGCCA CCTTCGCGTT CGACCACCCC 

58981 AACGCCCGTG CCGCGGCGTC CCGCCTCGAC GTCCTGTTGG CGGCGTCGAG CGATGAGACC 

5 9041 GCGCAGGAGG CGGAGATCCG GCAGGCACTG CGGACTGTGC CGCTGGCCCG GCTGCGGGCT 

59101 GCGGGGCTCC TCGACGGCCT GCTCGAACTC GCCGGGCTGG AAGCGGAGCC CGGCCTGCCG 
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59161 GGCGACGTAC CGGATCGCGG TGCGGCCACG 
59221 GACGGCCTGG ACGCCGAAGC ACTGGTCGAC 
59281 CCGGCGGCGG CGCCGCGGCC CGCCGTGCCG 
59341 CAGACCTGAC CGGGTCACGG CCCGGTGCTC 
59401 AGAGAAGGTA CTGGAGGCAC TGCGCACCTC 
59461 CAACCGCGAA CTCCTCGCGG CCCGCCACGA 
59521 CTATCCCGGC GGGGTCCUTT CGCCCGAGGA 
59581 CGCGGTGGGT CCCTTTCCCG AGGACCGTGG 
59641 CCCGTCCGTC CCGGGCACCA CGTACTGCCG 
59701 CTTCGACGCG GCTTTCTTCG GGATAGGGCC 
59761 GCGCCAGCTG CTGGAGGCGT CCTGGGAAGC 
59821 GCTCCGCGGC AGCCAGGGGG GTGTGTTCGT 
5 9881 TGACGCGGCG GCGTCGGGAC GTCTGCCGGA 
59941 CGCCGACGCC GTCCTGTCGG GCCGGATCAG 
60001 GACCGTCGAG ACGGCCTGCT CCTCCTCCCT 
60061 GCGCCGTGAG GAGTGCGAGT TCGCCCTGGC 
60121 CGCCTACGTG GAGTTCGCCC GGCAGCGGGG 
60181 CGACGACGCG GCGGACGGTA CGGGCTGGGC 
60241 GCTGTCGGAC GCGGTACGCA AGGGGCACCG 
60301 GAACCAGGAC GGTGCCAGCA GCGGTCTGTC 
60361 CATCCGCCGA GCGCTGGCCG ACGCCCGGCT 
60421 CCACGGCACC GGCACTCGGC TGGGGGACCC 
60481 CGGAGAGGAG CGGAGCCCCG AACGCCCTCT 
60541 TCACGCACAG GCGGCAGCCG GAGTCGGCGG 
60601 CGGCCTGCTT CCCCGCACGC TCCATGTGAC 
60661 CGGACAGGTG CGGCTGCTGA CCGAGCCGGT 
60721 GGCCGCGGTC TCGGCGTTCG GCGTGAGCGG 
60781 GCCGCCGCCC ACCCGGCCCG AAGCGGTCCG 
60841 CCCGTGGACG CTGTCCGGCC GTACGAGGCC 
60901 GGCGCACCTC GAACAGCACC CGGACCTCGA 
60961 GACGCGCACC CACTTCGAGC ACCGGGCCGT 
61021 CTCCCGTGCC GACGCGCTCG GGGCGTTGCG 
61081 GGCGGTACGG GACACCGCGC GGGGCGAAGG 
61141 CAGCCAGCGG CCCGGCATGG CGGAGCAGCT 
612 01 ACTGGACACG ATCGCGACGC ATCTGGACGC 
61261 GTTCGCGCCG GCCGGTACGG CGGAGGCCGC 
61321 GGCCCTGTTC GCCGTAGAGG TCGCGTTGTT 
61381 CGACGTACTG CTGGGCCATT CCGTGGGCGA 
61441 CGGGCCGGCG GACGCCTGCT CGCTGGTCGC 
61501 GGCCGGCGGC GCGATGCTCT CGGTCCGTGC 
61561 CGGGCAGGAG GACCGCATCG CGGTGGCGGC 
61621 CGGGGACGAG GACGCGGTCT CGGCGCTCGC 
61681 CAAGCGCCTC AACGTCAGCC ACGCCTTCCA 
61741 GTTCCGCCGG GTCGCGGAGA CGGTGGAGTA 
61801 CCTGACCGGC CGCCCGGCCG ACGCCGGGGA 
61861 GGCGAGGGAG ACCGTCCGGT TCCACGACGG 
61921 CACCTTCGTG GAGCTGGGGC CGGACGGCGT 
61981 GGAGGAGACC GACGGGGAAG CGGCCGCCGA 
62041 GCCCGTGATG CGTCGGGAGC GGCCGGAGGG 
62101 CCACGCGCGC GGGGCGGAGG TGGACTGGTC 
62161 CACCACACTG CCCACCTATG CCTTCCAGCG 
62221 CGCCGCGCCC GCGGCGGGCC AGGGGGCCGG 
622 81 GGCCGCCCGG CCCACGCTGA CGGAACAGGA 
62341 GGCCGCACTC GGCCACGCCG AACTGGAGGA 
624 01 CGGCTTCGAC TCGCTCGCGG CCATCGAACT 
62461 GGAAGTGCCG TCGTCCCTCG TCCTCGACCA 
62521 GGCAGCCGCC CGGACGGCCG CCGACAGCGA 



CCGGACGAGG AGTCCGCCCT GGCGGAAGTC 
CTCGTCCTCA ACCAGTCCGA CTCCTGACCG 
TCGCCGCCCT CGGCCGTACG AAGAACCCCA 
AGCAAGGAGA CCACTCATGG CTCTGTCCCA 
CGTCAAGGAC GCCGAACGGC TGCGCAAGCG 
GCCCATCGCC GTCGTCGGCA TGGCCTGCCG 
CCTCTGGGAA CTCGTCGTGT CGGGCACGGA 
CTGGGACGTG GAGCGGATCT ACGACCAGGA 
CGAGGGCGGA TTCCTTTACG ATGCGGGGGA 
GCGCGAGGCC ACCGTGATGG ACCCCCAGCA 
CCTGGAGCAG GCCGGGCTGG ACCCCCGGGC 
CGGCGCGGCG AACCAGGGCT ACGTACCGGG 
GGGTTCCGAC GGCTATCTGC TCACCGGCAA 
CTACTTCCTG GGCCTGGAAG GCCCGTCCAT 
GGTGGCACTG CACCTGGCGG TGCAGGCGCT 
CGGAGGGGTC GCCGTGCTCG CCAACCCGGC 
ACTCGCCCCG GACGGGCGCT GCAAGGCGTT 
CGAGGGCGTC GGCGTCCTGG TGGTGGAGCG 
GGTCCTCGCC GTCGTGCGGG GCACGGCGGT 
CGTGCCCAAC GGGCCCTCCC AGCAGCGGGT 
GGAGGCCGGC CAGATCGACG CGGTGGAGGC 
CATCGAGGCG CAAGCCCTGC TGGACACGTA 
GTGGGTCGGG TCGTTGAAGT CGAACTTCGG 
CGTCATCAAG ACGGTGATGG CGCTCCGGCA 
CAGCCCGACG CGGCACGTCG ACTGGGGCGA 
CGACTGGCCG CGGACCGGCG CCCCCCGGCG 
CACCAACGGG CACATCATCC TCGAGGAGGC 
GCAGGCCGGG GAGCGGCGGC CGGTCCTGGT 
GGCGCTGTGC CGGCAGGCCG CGCGCCTGGC 
CCCGCTGGAC GTCGGGTTCT CGCTCGCCAC 
GCTGCTCGCG GACGCCGCCA CCGAGGGCGG 
GGCGATCGCG GAGGACCGCG ACCCGGGCGG 
GCGTATCGCC TTCCTGTTCT GCGGGCAGGG 
GTACGCGCAG TACCCGGCGT TCGCGCGGGA 
CCATCTGGAC CGTCCGTTGG CGACGGTGAT 
GCTGCTCGAC GGCACGCAGT ACGCCCAGGC 
CCGGCTCTTC GAGGGCTGGG GGCTGCGCCC 
GCTGGCCGCC GCCCACGTGG CCGGGGTGTT 
CGCACGCGGC CGGCTCATGC AGGAGCTGCC 
CGCCGAGCAC GAGGTGCGGG AGCTGATCGC 
CGTCAACGGG CCCCGCTCCG TGGTCGTATC 
CGAGGAGCTG ACCGAATACG GCGTGCGCAC 
CTCCCCACGT CTGGACTCCA TGCTGGAGAC 
CCGCGAGCCG ACGCTCGACG TGATCAGCGG 
ACTCGCCACC GCCGACTACT GGGTCCGGCA 
GGTGCGCGCC GCGCACGCGC GCGGCGTCAG 
GCTGTGCGGC CTGGCCCTGG AGACCCTGGC 
GACGCCCGGC CGGGCGCGGG CGGCGCTGGT 
CAGTACCCTC CTGACGGCGC TCGCCACGGC 
CCGGTTCTAC GCCGACACCG GCGCCCGCCA 
CCAGCGGTTC TGGCTGGAGA CGGCGGCCCC 
ACCGGCCGAC CCGCAGGACA GCACCGGTCC 
CCTCCTCCTG CTCGTGCGGA CGGAAGCGGC 
CGTACCGGCC GACAGCCTCT TCGGCGACAT 
GGGCGCCGCC CTGACCGGCG CCACCGGGCT 
CCCCACGCCC AGGGAGCTGG CCGCGCACCT 
CGACACGTCC CCCGAAGGCC CGGACACGGC 
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62581 CGGTGAGAGC AGCCTGTCGG CGATGTACCG GCGGGCCGTG CGGCTCGGCC GGGCCGAGCC 
62641 GTTCATCGGC ACACTCGCCG AACTCGCCGC CTTCCGGCCC GTCTTCCCCG CCGATCACAC 
62701 CCTCGCGGAC GGCGAGACCG TCGGACAGGC GGCCGCCGCC TGGCAGCCGG CTCCGGTGCG 
62761 CCTGGCCACC ACGGACGGTG AGGGACCGGA GCTGATCTGC TGCGCGGGTA CGGCGGTGGC 
62821 GTCGGGACCG GAGGAGTTCA CCGCGCTGGC CGCGGCCCTG GGCGACCGGC TGACCGTGTC 
62881 GGCACTGCGC CAGCCCGGCT TCCGCGCGAA CGAGTTGCTG CCCGGCTCCC TGGACGGGCT 
62941 GCTCGACGCG CAGGCGUACG CGGTGCTGCG GCACACGGGT GACAGGCCCT ACGCCCTCCT 
63001 CGGCCACTCG GCGGGCGGGG CGCTGGCGCA CGCGCTGGCC TGCCGACTGG AGGAGCTCGG 
63061 CGCGGGTCCC GCGGCGCTGG TCCTGGCCGA CGTCTATCTG CCCAGCTCGC CGGGGGCGAT 
63121 GGGGGTGTGG CGCAACGAGA TGCTCGACTG GGTCATGCGG CGTTCCGTGG TGTCCATCGA 
63181 CGATGCCCGG TTGACGGCCA TGGGCGCCTA CAACCAGATG CTCCTGGAGT GGACACCGCG 
63241 GCCCACGAAG GCGCCGGTCC TGTTCCTGCG CGCGACGGAG CCGGTGAGGC CGTGGTCCGG 
63301 AGAACCGGAG AGCTGGCGGG CGCACTGGGA CGGCGGCGAC CACACCGCCG TCGACGTGCC 
63361 CGGCACCCAC CTGACGCTGA TGACCGAGCA CGCCCGCCAC CTCGCGGCGA CCCTCCACAC 
63421 CTGGCTCGGC ACCCTGTGAA CCACGCCCGG GGCGGCTTCG CCGCGCGTAG GACTGCCGCC 
634 81 TCCCCCGACT TCCGTACACC GCGACACCTT GGAGGACTCC CGTGACAACG CAGTGGACCA 
63541 CCCCGTCCGT GCTCGGCCGC AGACTGCAAC GCACCTACGT GGGGCACTGG TTCGCAGGAA 
63601 CGCAGGGAGA CCCCTACGCG CTGATCCTGC GCGCCCAGCG GGACGACACC ACCCCCTACG 
63661 AGGAGGACGT CCGCGCACGC GGACCGGTGT TCCACAGCGA GGTGCTCGAC ACCTGGGTGA 
63721 TCACGGACGG CGCTCTCGCC CGGTCCGTCC TGACCGACGC CCGCTTCGGC GGGCTGACGC 
63781 GCGCGGGAGG ACGGTATCGC GCGGAGCTTC TCCCTCCGGC GGGCCCCGAG GTCGGTCCGG 
63841 CCCGCGCAGG GGTACGCGGC GGCGTGCGGG CCGACGCCGA TCCGGCGGTG TCGGCGCAGG 
63901 ACGAGGTGGT GGTGGAGGCC CTCGCCGAGC AGCTCTCACG CACCCTCCTG GGCGGACTCG 
63961 GCGACGACTT CGACCTCGTC GCCGCCTTTG CGCGACGCCT GCCGGCACAG GTCCTGGCGG 
64 021 AATTCCTCGG GCTGCCCGCA GCCGCGCGCA GCCGGTTCGA GGAACTGCTG GCCGGCTGCG 
64081 CCCACAGCCT CGACAGCCGG CTCTGTCCGC AGACGCTCGA CATCACACGG ACCGGCCTCG 
64141 GAGCGGCGGC CGAGCTCCGG GAACTGCTCG CGCGCCACCT CGGCGGGAGC GGACCACGCT 
64201 CCGCTCAAGC GGCAGTCTCC CTGGCAGTCG AGGTGGCCGC ACCCGCCGGC GCGCTCATCT 
642 61 GCAACGCGGT CGAGGCGCTG AGCAGCTCTC CCGGGCAGTG GAACGCCCTC CGCCAGAACC 
64321 CGGAGAAGGC CGACGCCGTC GTGGCGGAGA CCTGGTGGCG ACGACCGCCG GTGCGGGTGG 
64381 AGAGCCGGAT CGCCCAGGAG GACGTCGACG TGGCCGGAGT GCCCGTCCCC GCGGACGGGC 
64441 ACGTGGCGAT CCTCGTCGCC GCCGCCCAGC GCGACCCGGC GATCACCCCG GCCCCGACGA 
64501 AGGACGACAC CGGCACCCCC GGACAGGGCG ACTGCGGCGT GCCCCTGGGG CTCGTCGGCG 
64561 ACGCGCACGC CACCTCCGCC GCCCGGACGG TCCGCGCCCT CTGCCGCGGT GCGCTGCGAG 
64621 CGCTCGCGCA GGAGGCACCG GGCCTGCGGC CGAACGGGAC CCCGGTGCGC CTCAGGCGGG 
64 681 CACCCGTCAC GCTCGGCCAC GCCCGCTTCC CCGTCGCCCG GACGGGCCGG GGGACACCGA 
64741 CCGACGCGGG CGCGGCATGA GCACCCGCGA CGACCACCGA CTGCCGAACG GGGAGACGAG 
64 801 CCGATGCGCG TCCTGATGAC GTCGATCGCC CACAACACGC ACTACTACCA CCTGGTGCCG 
64861 CTCGCCTGGG CCCTGAAGGC CGCGGGCCAC GAGGTGCGCG TCGCCGGCCA GCCCCGCGTC 
64 921 ACGGACATCA TCACCGGGTC CGGACTGACC GCCGTGCCGG TCGGTGACGA CGAGGACATG 
64981 ATGGAGCTGT TCGCCGAGAT CGGCGGAGAC ATCACCCCCT ATCAGGAGGG ACTGGACTTC 
65041 GCCGAGGAGC GGCCCGAGGC ACGGTCCTGG GAACATCTGC TCGGACAGCA GACCGTTCTG 
65101 ACCTCGCTGT GCTTCGCACC GCTCAACGGC GACTCGACGA TGGACGACAT CGTCGCGCTG 
65161 GCCCGCTCCT GGCAGCCGGA CCTGGTGATC TGGGAACCCT TCACCTTCGC CGGAGCGGTC 
65221 GCCGCCCACG CCGTGGGCGC GGCGCACGCC CGCGTCCTGT GGGGTCCCGA TGTCATCGGC 
65281 CGGGCCCGGG AACGGTTCGT GGAGGCCAAG GCACAGCAGG CTCCCGAACA CCGGGAGGAC 
65341 CCGATGGCCG AGTGGCTCGG CTGGACCCTG GAGAGGCTGG GCCTCCCGGC CGCCGGAGAC 
654 01 GGGATGGAGG AGTTGCTGAA CGGCCAGTGG GTCATCGACC CGGGCCCGGA GAGCGTCCGG 
65461 CTCGACCTTC GCGAGCCGAT CCTGCCCATG CGTTTCGTTC CCTACAACGG ACCTGCCGTC 
65521 GTCCCCGGAT GGCTGTCCGA GAAGCCGAAG CGACCGCGCG TCTGCCTCAC CCAGGGAGTG 
65581 TCGGGACGCG AGACCCACGG CAAGGACGCC GTCCGCTTCC AGGACCTGCT CGCGGCGCTC 
65641 GGCGACCTCG ACATCGAGAT CGTCGCCACC CTGGACAGCA CCCAGCGGGA GAACCTGACG 
65701 GAGGTCCCCG ACAACGTCCG GATCGTCGAC TTCGTCTCGA TGGACGTGCT GCTGCCGAGT 
65761 TGCGCCATGA TCATCTACCA CGGTGGCGCC GGCACCTCGG CGACGGCCCT CCTGCACGGC 
65821 GTTCCGCAGG TCGTCATCGG AGCGCACTGG GACGTGCCGG TCAGGGCACG GCAGCTCGAC 
65881 GACCTGGGCG CCGGCATCTT CATCCGGCCC GAGGACCTCG ACGCCGCCAC ACTGCGCGCG 
65941 GCGGTTCAGC GCGTGCTCAC CGAGCCCTCC CTCCAGCGGG CCGCGGACCG GCTGCGGGCC 
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66001 GAGATGCGCT CCAACCCCAC GCCGGCCGAG ACCGTCACGG TGCTGGAGCG GCTCTCCCGG 
66061 AGCCACCGAC AGCCCCGCTG ACCACACGCG GTACACGGTG CGGGCCCACG TGCCGGGGGC 
66121 TCCACCGTCG CCGGCGGTCG TCGGATCGCC GTCCGGCCAT GTCCCGGCAC CCAAGGACGG 
66181 AGCAGAGCAG AACATGGAAT TCGAAGGTCA GGTCGCGCTC GTCACCGGGG CCGGCAGGGG 
66241 GATCGGCCGT GCGACGGTCG TCCGCCTCGC GGAGGCCGGA TGTGACATCG CCCTCCACTA 
66301 CAACCAAGCG AAAGCGCAGG CCGAGGAAGT CGCCGAGCGC ATCGCCGCAC TGGGCCGCAC 
66361 GGTCGAACTG TTCCCGGGCG ACCTCTCCCG CCCCGAGACC GGGCGACAGC TCGTGGCCGC 
66421 GGTGCAGCAG AAGTTCGACC GGATCGACAT CCTGGTGAAC AGCGCGGGCA TCACACGGGA 
66481 CAAACTCCTG CTGTCCATGG AGGCGGACGA CATCCACCAG GTCATCGCCA CCAACCTCGT 
66541 CGGCCCGATG TTCCTCACCC AGGCGGTCGC GCTCACCATG CTGCGTCAAC GCTCCGGGCG 
66601 CATCGTCAAC ATCTCCTCCG CCGCCGCGAG CAGGCCCGGA AAGGGCCAGT CCAACTACGC 
66661 CGCGTCCAAG GCCGGTCTGG AGGCCTTCAC CAGGGCCATG GCGGTGGAAC TCGGATCCCG 
66721 CGGAATTCTC GTCAACGCGG TCGCTCCCGG CATCGTCAAG ACCGGCCTGA CCGAGGCTCT 
66781 CCGCGAGGGG GCGGAGCCCG AACTCCTGGC CCGGCAGGTG ATCGGTTCCT TCGCCGAACC 
66841 CGAGGCGGTG GCGGAGGCGG TGGCCTACTT GGCGAGCCCG CGCAACACGC ACACGACGGG 
66901 CACGGTCCTC ACCGTCGACG GCGGGCTCAA GATGGTGTGA GGCCCACCGG GCATCGGACA 
66961 GCCGGTGGAT CCGCCGGAGG CGGAGACCCG CGCACCGCGG GCGTCGACCC GCGCACCGCG 
67021 GGCGCGCGTG GGGGCGCCCG CGGTGCGCGA AGGCCGCCTC TGGCTCGCGT CACCGGAAGA 
67081 AGCCCGATTC CCCAGCGGGC GACCGTAGCC GGAGCGTTGG ACCGCCCTGC TCCGCACGTC 
67141 GAGCCCCACC GTGGTAGCGG CGGACATGCC CAGGGGGCGC ACGCCCGGAT CCTGCCGCAC 
67201 GACCAGGCGC ACCAGGGTTT CGGCGAACAC GGCGGCCGAC GCCGCGGGGT TGCCGCCGCG 
67261 GCACGGCCGA CGCGCGCTGG TCGCACGGGG TGAGCCCGCG GGAGCGGAGG GCGCCGGTTC 
67321 GCTCAGGACG CCGGGATCTC CGCCGGGACG ACGGACTCGG GGGCGCCGTC GGGGCCGGGC 
67381 CTCGTGCCAC CGTCGGCGTC GAGCTGCGGC GGATCGGAGA GCTGCTCGCG CACGTACCCC 
67441 CACACCACGG CGACGAGGGC GGCGACGGGC ACGGCGAGGA GGCTGCCGAC GATGCCCGCC 
67501 AGACTGCCGC CCAGGGTGAC GGCGAGCAGG ACGACGGCGG CGTGCAGTCC GAGTCCGCGG 
67561 CTCTGGATCA TGGGCTGGAA GACGTTCCCC TCCAGCTGCT GCACCACCAC GATGATCGCG 
67621 AGCACGATCA GGGCGTCGGT CAGGCCGTTG GAGACCAGGG CGATGAGCAC GGCGACGAAT 
67681 CCTGCTAACA GCGCGCCGAT GATCGGCACG AAGGCGCTGA CGAAGGTGAG TACGGCGAGC 
67741 GGCAGGACGA GCGGGACGCC GAGGATCCAC AGGCCGATGC CGATGAGGAC GGCGTCGAGG 
67801 AGGCCCACCG CGGCCTGGGA GCGGACGAAG GCGCCGAGGG TGTCCCAGCC GCGTTCGGCG 
67861 ATCGTGGTGG CGTCGGTCGC GAGGCGGCCG GGGAGCTGAC GGGCGAGCCA GGGCAGGAAG 
67921 CGCGGCCCGT CCTTGAGGAA GAAGAACATC AGGAAGAGCG CGAGGACGGC CGTGACGACC 
67981 CCGTTCACCA CCGTGCCGAC GCCGGTGGCG AGGGTGGTCA GCAGGGATCC GACGCTGTTC 
68041 TGGAGACGGT CGGTCGCGGT GTCCAGGGCG CCGGTGATCT GGTCGTCGCC GATGTTCAGG 
68101 GGCGGACCCG CGGTCCACTC GCGGAGACGC TGGATGCCCT CGACGACACC GTCGGCCAGT 
68161 TCGCCGGACT GCGAGGCGAC GGGCACGGCG ATGAGCGCGA CGGTGCCGGC GGTGACGGCG 
68221 AGGAAGAGGA CGGTGACCAC CGAGGCGGCG AGCGCCGGCG GCCAGCCGAG ACGGCGCAAC 
682 81 AAGCGGGCGA AGGGCCAGGT CAGCGTGGTG ATCAGCAGAC CGATGACGAG TGGCCACACG 
68341 ATCGACCACA TCCGGCCGAG CAGCCAGATC ACCGCCGCCG TGCCCAAGAG CACCAGCAGA 
68401 AGCTCCGTCG ATATGCGGGC CGAGGCCCGT AGCGCGGCAC GTGTTCTCGC AGGACTGAGA 
68461 GAGGCAGACA TGGCGATCAC CCTAGAGCGG CCCGGGCGGC CCGCTGCCCC CGTGCCCCGA 
68521 TCCTTCGCGC CGGGGTGACG CGCATCGGGG GTTCCGGCAC TGCCTGAGCG CCTGCACGGA 
68581 CGGGGCGGGT TACGCCGAGG GGAAGCAGCC GGCTCCGGAT CGACAGGAGT GCGGGTGACG 
68641 GTCGTACACC GGCGCTACGT CGCCCACTCC GCGGCGCACA AGGGCGGTCG CCGTCTGGCC 
68701 CCTCCCCGCG TCCGACGACC GCCCACCCCT CGTCAGGGAG CGGGGTCCGG GGCGAGGTGG 
68761 ATACGGGCCG CTACCGGGAG GTGGTCGCTG GCCGTCGCGG GAAGGGTCCA TGCCGAGGCG 
68821 GCCCGGACGC CCCCGAGAAG GATCTGGTCG ATCCGGACGA CCGGCAGGCG CGCCGGCCAG 
68881 GTGAAGCCGA AGCCCGCGCC CGCCGCGGCC TGTGCGGAGA CGAGACGGTC GGTGAGGGGG 
68941 CGCAGTGCGC GGTCGTCCGT GGAGCCGTTG AGGTCGCCCA GGAGGACGAC GCGCCGGACG 
69001 GGCTCGGCCC GCACCTCCGC CGCCAGCAGC CCCAGCGCCT CGTCGCGCGC CCCGGCGGTG 
69061 AATCCGCCCG GGCCCACGCG GACGGACGGC AGATGGGCGA CATAGACGGC CAGCGGCCCG 
69121 CCGGGCGCGT CCACCGTGGC CCGCATCGCC CGCGTCCACG GCATGATCGG CACAGCCCGC 
69181 GCGTCGCTCA GCGGATGCAC GCTCCACAAG CCCACCGTGC CCTCGTAGAA GTGGTACGGG 
69241 TACGACTCCG CAAGGGCCCG CTCGTAGGCG GGCGCCGTGG CCGGGCTCAG CTCCTCCAAC 
69301 GCCAGTACGT CCGCCCCGGC GGCGAGCAGG CTCCGCACCG TTCCGGCGGG GTCGGGGTTG 
69361 GCCTGCTCGA CGTTGTGACT GACCAGCGTG AGGTCCCCAC CGGGCGTCGT CTTGTCGGTC 



58 



69421 AGCGCTCCGC CGAAGGCCGT CAGCCAGGCC 
69481 GCCACCCGCG CGCGCCACAG CAGGGCCGCG 
69541 CAGGGCAGCA AGGTCTCTGC CAGACTGCTC 
69601 TGGCCCGCGA CGCTCACCGC GAGGAAGACG 
69661 CGCCACCACC GCCGGCGCGG ACCGGCGTCC 
69721 CCCCTGCCGT CCGCCGTGCC GTCCGCAGCC 
69781 TCCGCCGCAG TGGAGGGGGC GCTCTCCGCT 
69841 ACAGTCTGTG CCGCCCTGCC GTCCGGCGCC 
69901 GCCCCGCGGC GTCGCCACCA CCGCCGGCGC 
69961 CCGGGAGCGC GGCCCCTGCC GTCCGCCGTG 
70021 TCCGCCCTGC CGTCCGCCGC AGTGGAGGGG 
70081 GCGGGGGAGG GGACAGTCTG TGCCGCCCTG 
70141 TCCGCCGTCC CGCCGTCCGG CACCGCGGCC 
70201 CCCCGCCCTG GGTTCCGGCG GCGGCCAGCC 
70261 CCGCCTCGAC CGCGGCCCGG AGCCCCTCGG 
70321 CGGCGCCCAC CAGCAGCCGG GGCGCGCCGC 
70381 TCCCGCCCGC CGGGGTGCTG GAGCCCGTCT 
70441 GGGGATTGGT GTTGGTGATC GTCCCGAGGC 
70501 CCGCCCGGAA GACAGGGTCC TCCATCGCGG 
70561 TGCTTGTGGT CGTCGGCTCG ATGCCGCTCG 
70621 GCCGTACGGC GGCGCGCCGC ATCTTCGTCA 
70681 GGGCCAGCAG ACGGGCGACG TTGTTGCCGG 
70741 GCTGGGAGTG GCGTTCGCCG GACCGTACCG 
70801 CGTGCGCGGC CGTCCGGTCC ACTTCGATCA 
70861 GCTCTTCGAG GATGACGTAG GCGGTCATCA 
70921 GCCGCTCGCC CCGCTCGCCC AGCGAGCCGG 
70981 CCTGCGGCCA GGGGAGCGGC CCGATGTCGG 
71041 GAGGCGCGCC GGAGGGCGAG GCGACGGTGA 
71101 GGGCACCGCC GACGAGGCGG TGACGGGGGG 
71161 GGCTGCGGTA CGGGATCGGT ACGGCAGCAA 
71221 TCGGGAGAGC GCGGCCCGCG CTCGGGAACC 
712 81 ACCGGCCGCC GGGCGACGCG GAGGGAGCGC 
71341 GTCTGGGGTG GGGAGGCCCG CGGGCTGTCC 
71401 CCGTCCTCGG GCAGGCTCAG GGTCGCGACC 
71461 AGGGTCGCGC CGATCACCCT CGCCTGGCCG 
71521 TGCCCCCGTT CGGCCGAGCC CGTGCGGAAC 
71581 GGGAAGCCGG GGCCGTGGTC CCGGACGGTC 
71641 CGACCCGCGC CGTGCCGGTG GGCGTTGACG 
71701 CGGGGGTCCG ACTCGACCAC TGCCGCCCCC 
71761 CGCGCCACCG AGTCCCGGAC GAGGGCGCCC 
71821 GCGCCCGCGT CGAGCCGGGA GACCTCCAGG 
71881 ACCCGGCTCT GGACCATGTC CGTCACCTCG 
71941 AGGCCCATCA GCGGGGTGCG CAGCTCGTGT 
72001 TCGATCCGCT GCTGGAGGCT GTCGGCCATC 
72 061 TCGTCGCCTC CGCGGACCGT TCCCGTCCGG 
72121 GCCGTGCGGG CGACCCGGCG CAGCCGCCGG 
72181 GGGACGACGA CGCCCAGGGT GAGCAGCGAG 
72241 CGGGTCAGCA GGTCGGCGGT CATGTCGACC 
72301 CGGGCCGCCC GGAAGACGGG GGCCGGGGGC 
72361 TGCTCGATCT GCCTCAGCAG GGCCTCGGGC 
72421 TCACCTGCTG CGTCGGCGTC CTCCAAGGCC 
72481 CCCTCGTGCA GGGAGCGCCG CAGCACCGAG 
72 541 ATCGAGGCAC AGGCGAGCGC CACCAGGAGG 
72601 GGTACCAGGC CCGTGACGGC TCCCCGGGGG 
72661 AGCCCCGGAC CGTCTCGACG CGCTCGGCGC 
72721 GGTCGACGAC CCGGGTGTCC CCGTCCCAGC 
72781 GACGGTCCAG CACGATCCCG GGGTGCGCGG 



ACGGCCGGCA CCACCAGGGC CACCACCGCC 
GTCACCAGCA CGGGCACGGC CAGCGCGCTC 
AGCCGTCCGG GCAGTCCGGG GAGCCGCCCA 
GAACAGCCCG TGACCGCCGC CCCGCGGCGT 
CCCACGCCGA CCGCGGCTCC GGGAGCGCGG 
AGGGAGCAGA CGGTCTGTTC CGCCCTGCCG 
GCCCCGCTGT CCGGCGCCGC CCGGGAGGGG 
GCGAGGAAGA CGGAACAGCC CGTGACCGCC 
GGACCGGCGT CCCCCACGCC GACCGCGGCT 
CCGTCCGCAG CCAGGGAGCA GACGGTCTGT 
GCGCTCTCCG CTGCCCCGCT GTCCGGCGCC 
CCGTCCGGCG CCGCGGGGGA GGGGACGGGC 
GGATCCCGGC GTGTCGTCGC CGTCATCGTT 
GCTCGCGGAC GGCGGTGAGC AGGCCACGGG 
CGGGCGTCGT GTTCGCCCGC TGGTGCAGGA 
CCACGCGCAC CTCGTAGGCC CACAGGAGGT 
TCACACCGAG GACGCCCGGG GTGTCCAGCA 
CCGGGACGGT GGTCTCGCGG GTGGCGACGA 
CCCGCGTCAG CCGCACCTGG TCGGCCGCGG 
CCCCCGTGTA GACCGTGTCC TTCATCCCGA 
CGAAGGCCGC CTGGCTGCCG GAGTCCCAGC 
AGGGGATGAG GAGCAGTTCC AGGAGCCGGC 
GGACGGTCGA CTCGCCGCCG ACCCCCGCCT 
GGGGGCCGTC CTCGTCGGGC CTCAGCGGAT 
CCTTCGTCAG GCTCGCGATC GGTACGGGCC 
TTCCTTCGAG CTCGACGGCG CTCTGGCCGT 
AGACCGGTAG TCGCTCGCCT CCCGCCGCCG 
TGCCCAGGGC CGTCAGCAGG GCCACGGACA 
TGCGGGGCAG GGACACGGGC CGCCTCCAGG 
GACTCCGGGG GCTACGTCTC CGCCTCACGG 
CATCGGTCGT GTATCGGCGG GGCGGCGGCG 
CTGAGGGGCG GGGCGTACCG ACAGGCGACC 
CGGGGACCGG TTCACGCCTC GGACGTCTGC 
GCTCCTCCGT CCCGGGCGTT GGCGAAGGCC 
GACGCGATCG TCAGGCCAAG GCCGTGCCCG 
CGCTGGGGGC CGTGGGACAG CAGGTCGGCG 
ACGGTCCGGC CGGCGACCGT GACCTCCACC 
ACGAGGTTGG AGACGATGCG GTCCAGGCGC 
TGTCGCGTCA CCTGCGCCGC GAGGCCCGTC 
AGGTCGACCG GTCCCTGCTG GGCCGTCTCG 
AGGTCCTCCA CCAGGTCGCG CAGCACCCGT 
CCCTCGGGCA GCAGTTCCGC CGAGGTGACC 
GCGACGTCGG CGGTGAAGCG CTGCTCGGTG 
GAGTCGACCA CGGCCGAGAT CTCGGCGACC 
GCGTCGAGGT CGCCCGCCGT GATGCGCCGG 
GCGGGGAGTT CCGTCGCCAG GGCCGTGGCG 
TACTTCCACA TGTGACGGTC CAGGGCCTGC 
TCGACCGCGT ACAGCTTCCC GCCCTCCCGG 
CCGTCCTCGT AGAGGGTGGC CTCGCCGCCG 
AGTTCCTCGG GGGACACCCG GGGCCCTTCC 
GTTGCCAGCG CCACGTGGGC CCTGCCCGCG 
TCGTGCACCA GCACTCCGAC GGTCAGCGCC 
ACGATCTTCC AGCGCAGCGA GCGGGAGCGC 
GCCCGGCTCA CCGCTTCCAC TTGTAGCCGA 
CGATCTTCTT GCGCAGCCGC TGCACGCACA 
CGTAGTCCCA CACCTCGCGC AGGAGGGTCT 
CGAACTGGAG CAGCAGCCGC AGCTCGGTCG 
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72841 GGGCGAGCGC GATCCGCTCG CCGCCCCGGC 
72901 GGTCGCCGAA GAGCAGCGGG CCGGCCGGCG 
72961 ACACGAAGGC GGCCCGGCGC AGCAGCGAAC 
73021 CGGGCTTCAC CACGTAGTCG TCGGCCCCGG 
73081 CACCGCGCGC CGACATCATC AGGATCGGGT 
73141 GTCCGATGCC GTCCAGGCCC GGCAGCATCA 
73201 CCCGGAAGAG TTCGAGCCCG GTCAGTCUGT 
73261 GCTCCAGGGA CATCGCGACG GACCGCCGGA 
73321 TCACGGTCGC AGGCGCGGGC GCCGACACGG 
73381 GCGGGGCGGG CCGACGCGCG GCCCGGCCCT 
73441 GCGCCGCCGG AGAGCCGTCG GCACGCCTCT 
73501 TGGCGCGGTG ACCGACACAC GCCCGAGCGG 
73561 AAGATGCCGC GGGAGCCGAT CCGGCGCCCA 
73621 CCGCACGCCC GCGGGGGCTC GTGGCGGGTG 
73681 CTCCGGAGCG GGGTCGGTCC TGCGGCCGAG 
73741 GACCACACAT CCGGTGCTGA TGCCTGAGTC 
73801 GTGGTAGAAG TCCGGGGCGG CGATGGGGAT 
73861 GAGGACGTTG TCGCCGCGTT CGAGGGCGGC 
73921 GACGGTGCCG AAGAGGGCGA TGCCCACTCC 
73981 GACGGAGGCG AGCAGCGGGC AGAGGCCGAG 
74 041 GACGAAGCGG CTGCGGACCT TGGTGATCGC 
74101 GGCGGCGAAG CCGTTGAAGA GCGGGCTGAG 
74161 GGCGGCCAGG GTCTTCTCGT CGGCCGGCCG 
74221 GGCGGTGGAC TCGGTCATCG ACACGAGCAT 
74281 CGCGAACTGC GGAGCGCCGA AGTGGAACGG 
74341 TACGGCGCTG AAGTCGGCGA CGCCCAGCGG 
74401 CAGCAGGATG GAGATCTGCT TGAGGAAGCC 
74461 CAGCAGGGTG GCGGTGGCGA GGCCGATGTA 
74521 GTTGCCGCCC TGGGCCCAGT TGAACGCGAC 
74 5 81 GACCGTGCCC GTGACCACGG GCGGGAAGAA 
74641 GAGGAATCCG AAGACGCCGG CGACGATCAC 
74701 GGGGCCTTCG GCCTTCGCTA TCGCGAGCAT 
74761 GACGAACGGG AGCCGTGCTC CGACCTTCCA 
74821 GCCCGAGGTG AACAGGCTGG CGCTCATGAG 
748 81 GCCGATGCCG ACGACCAGCG GCGGCGCGAC 
74 941 CAGACCGGCG CTGAAGAGCT TCAGCGGGGG 
75001 CGGTCTTGCG GGGGTGTCGG GACGAGCGTC 
75061 GGCGGGGCGG GATTCCGGCA TGGCGTGTCT 
75121 GCAGCCCTGT GGGAGAGCCG GCGTCGCTGA 
75181 GGGTGGGGAG GATGCGGCAC CGGGCGTGGG 
75241 CCGGAGAGCG GGGAGTGGGG CGCCGGAGAA 
75301 CCGCGGTCCT CCGGCCGGCC GGCGCGCAGC 
753 61 CGGCGAAGTC GGAACAGTTC CTCACGGGGC 
75421 GGTTGAGGAC CTCCCGGACG TGGTGCCAGA 
75481 CCGCCCGGGC CTCCGCGCGG GCCGCCTGAC 
75541 GCTGGTAGGT GAGGTGGAGG CGTTCGAGGG 
75601 CGATGCGGGA AAGGGCCCCC ACGTAGGGGG 
75661 GGACGCGGAG CCGGCCGGTC CTGCCGCACG 
75721 GGACTCTCAG GAGGGCGTCC GTCTGGGCGG 
75781 CGTCCTCCCA CTGGCCGCGG GTGAGGCCGA 
75841 CGGCGACGGT ACGGCACTGG GCGAGGGTGC 
75901 ACCAGAGGGC CGCCGCGAGG GCCTTCACCT 
75961 CCCACGCCTC GACGAGGCCG GGGTGCGCGG 
76021 CGACCTGGCC GAGCGTCAGG CCGAGCTGGG 
76081 AGCGGGGCAG GGCATACGGA TCGTCTGCGG 
76141 CGGTCCACGG CCAAGGGCCG GAACGGTCTC 
76201 TCTTCGTCGG ATCTCCTGAG CGGGCAGACC 



GGACCTCCAG GGCCGCGGGG TCGAGGGAGA 
TCGCGGGGTC GGCGGGTCCG GGGGCGGGGG 
GGATGCGCGC CACCAGGACG GCGGTGTCGA 
CCTCCAGGCC GGACACCACG TCGAGGGCGT 
CCGTGGCCGT CTCCCGGATG CGCCGGCACA 
CGTCGAGCAG CACCAGGTCG TGCCGTCCCT 
CGGCCGCGAC GCGCACGCGG TAGCCGTAGC 
TCACCTCGTC GTCCTCGACC AGCAGGACGG 
AATCAGACAT GTCCATCTCT CGGGCACGGG 
CCATCATGCC TCACTCGCGC GCCTCCCCCG 
GAGCTGGTCT GATACCTGAC TGATACATCA 
GGACGGACAC GGAGGCGGCG CGCCGGACAC 
CCGAGGCCGT GTGAACGACG CCGTCCGGAA 
TCAGTGGTGT GCGGCAGGCG CGGGCAGCGT 
GTGGTTGAAG GCGAGGTTGA GGAGGACGGC 
GAGGACGATG CGGGCGCCTT CGGGG AAGGC 
GATGCCGGCG CCCAGGGAGA TGGCGACGAT 
TCCGGCGAGG GTCTGGATGC CGCTGGCGGC 
GCCGAGGACG GGCTGGGGGA TGAGCGCGAC 
CAGGAGGAGG ATGCCGCCGG CCGCGGCTAC 
GACGAGGCCG ACGTTCTGGG CGAAGGCGCT 
GGCGGTGCCG AGGCCGTCGG CGCGGAGCGC 
CTCGACGATC TCACCGAGGG CGAGGACGTC 
CACGATGCAC ATGGAGATGA TCGCCGCGGC 
GGTGGGGAGT CCGATGACGT CGGCGTCGCC 
CAGCGAGAGG AGCGTGCCGG CGACGAGGCC 
CGTCAGGACG CGGCGCAGGA CCACGGTGAT 
CGTGAGGCTG CCGTAGTCCG GTGCCTGCGC 
GGGCAGCAGG GAGACACCGA TGAGGGTGAT 
GCGGATCAGT TTGCAGAAGA AGGGGGCGAG 
GGCGCCGTAG ATGACGGGCA GCGCGTCGTC 
CGGTGCGACG CCGGCGAAGG AGACGCCGTT 
GAAGCCGAGC GTCTGCAGGA GCGTGGCGAT 
GAACGCGATG TCCGCGGTGG ACAGGCCGAC 
GACGCCGGCG TACATGGCGG CGACGTGCTG 
CAGCATCTGG TCGACCGGAT GGGTGTCGCC 
GGTTCCGTCC TCGGTCTTCG TCGGTTCCGG 
CCTGGCCGCG GCGGTCTCGT GGGAGCGCGG 
CGCACGGCTC TGCGGATGGG GGTGGTGACG 
ACGAGAGGCC GTCGTGCCTG GGTGCCGCGG 
CGGGCGGGAG GGATTCGTGT CCCGGGGTTC 
TGTCCGTGAT GCGCGTCGTG CCACTCCGTC 
GGCGTGGGTG TCAAGAGGCG GGACGGGCCG 
GGTGCAGGTC GATGTGTTCG AGGAAGTCCG 
GGGGCGGGCC GCCGGCCGCG CCCTCCCATT 
CGTGGCCGAG TACGGCGGGG TCGAGCGGCA 
GCCACCAGGT GGTGGCGGCC TCTCTGAGCA 
CGGCGACGAA GCAGGCCGGT GGCGGGCAGA 
GGGTCCCCTG CCAGCGGTTC CGGCGTTCGG 
GCTCGGTCGC GGTCTCGTTG ACGGTGAGTC 
CGGGCTCTTC GACGAGATCG ACCGGAGAGC 
GGGTGTTGTC GGGGGTGGCG GTGCCGGCTT 
GGTGTCCGCA GTAGGCGGTG ACGGCCCAGG 
CGCGGACGGC CGCGGCACGG GTGGGATCGA 
GTGGCATAGC GGGACACCGT AGGGACCCCG 
GATGTCGGCT CCGCCGCCCG GCGTTCGGCT 
GTCGATGACC TGCTCTTTCA TGGGAGGAGG 
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76261 CGGGCAAGCG GAGAGAAGAG GAGGCTGAGG 
76321 AACTGGCGCC TTCTCTTGGG GGAGTTGACG 
76381 CGTCCACAAC GGAAAACTTC TTCCGTCATG 
76441 GCGGCTCGCC CCCATCGGCC ATGCTCCCGG 
76501 GCCCACACGC CCCCGGCAGT CCGCTGCCCG 
76561 CCCCGCGCCA CTGACCACGC ATCGCCGACC 
76621 GCCCGCCGCA TGCTGCCCUC CGCCCTCCAA 
76681 GCACCACCGC TGATCATCAG CAGCGCCCTG 
76741 CTGGCCGCCG CGCTGCTGAT CGCCGGGCTC 
76801 GGCGTCGGCG CCGGGTTGCC CCTCGTCAAC 
76861 CTCGCCACCG CCGCCACCCA GGGGCGCGAC 
76921 CTCGTGGCGG GACTCCTCTG CCTGCTCCTC 

76 981 TTCCCACCGG TCGTCAGCGG CTGCGTCATC 
77041 GCCGGCACCT GGGCCCGGGG CGGAGACGCC 
77101 CTGGCCCTGG CGGCGACGAC CCTCGTCATC 
77161 CGCTTCCTCG GGCGGGTCGC CATCCTCATC 
77221 CCGCTGGGCA AGGTCGACCT CGACCCCCTC 
77281 CCCTTCGGCT TCGGCACCCC GCAGTTCGTC 
77341 ATGATCGTGT CCATGATGGA GTCCACCGCC 
77401 CGGCCGGTCC GGGACCGGAC CATCGCCGGA 
77461 CTCGGCGGCG TCCTCGGCTC GTTCACCAGC 
77521 TCCCTCAGCC GGATCCGCAG CCGCTATGTG 
77581 ATGGGCTTCG TGCCCGTCCT GGGCTCGTTC 
77641 GGTGCGGGGG TGGTCTTCTT CGGCTCCGTC 
77701 GCCGCCCTCG GCACCGGACA CAACGCTGTG 
77761 TTCCCCGTCC TGGACCCGGA CTTCTACGCC 
77821 GGCTCGGGGA TCACCGCCGG CTGCCTGGTC 

77 881 CTGGGCCGCG GCACCGAGGC CGACCCCGAC 
77 941 GACACCGCGG ACACCGTCCT CGGGCCGAAG 
7 8001 TCCGGCAGCC CCTCCGGCAC CCCTGACCAC 
78061 GCTCCCGCCT GGCCCTACGT GACCGGCCCC 
78121 CGGCCGCACG AGGTTCCGGC GCCGCCCCAC 
7 8181 CCCTCTGCCG CTCACGAAGG CGAACCCCCG 
7 8241 GGACCGCTCC ACCCGCTCCA CCCGCTCCAC 
7 83 01 CGGCAACGGC ACAGTGCGGA GGCCGACCCC 
78361 GGCGACAGCC AGTAGAGACG ACCTCCCCCG 
78421 CCCGGCGGAG AGGTGGCGCC GACCCACCCG 
784 81 TGTCCCACGG ATTCCCCGGC GACAAGACGA 
7 8541 CCGCCACCGT GCGGGCGGCT CCCCGCGCGG 
78601 AATCGCCCAG GTCGTGGCCG AGGCCCTCGG 
78661 GCTGCCCGGC CTCGGCACCA CGCGACTCCC 
78721 GATGCTCTGC AACGCCCCCG GCGTCCCCGA 
78781 CGTGACACGG GGCCCCGTCG TGGTCCTCGC 
78841 GCTGGACCGG TACGCCCCCG AGGTCCTGGC 
7 8901 CGATCTGACC GCCGTCCTCG GGGGTTCCGC 
7 8961 CTGCACCGAC ACCTTCGACG AGGCGTACTA 
7 9021 GGCCCGCCAG GCGGGGTCGG CCTGGAGCTT 
79081 CACGACCCTG CGCCGCGAAC TCCGGTCGGG 
79141 CCGCCGGCCC GTCTACGAGG GATCACTGGT 
7 9201 CGGGGACGCG CCACCCCCGG GGTCGCGTCA 
7 9261 GCGCATCAGC GTCTCGTGCT GCTGGAGCAG 
7 9321 CCGGAAGCCC TCGGGTGCGA CATCCGCCCA 
79381 ATCCGTGTCG CCGATCCAGG CGTGCACCGG 
79441 GTAGGTGCTC ACCACGGTGA AGTCCGCCCG 
79501 CGGGTCGTCA AGCAGGGCGG TATCGGTCCC 
7 9561 GTCGTCCCCC TTCCGGTGCA GGTCGAGCGG 
7 9621 GACGTGGAGG GCGGCGGGCG TGACGCGGTG 



CGACTGTCGG TTCCTCCAGA ATCCGCGAGG 
AGGTTCAGGC CGGGCCATAA AGTCCCGTCC 
CGAAACCTGT GGTGACAGCC GCCCACCCCC 
TTCGGGCCGT CCCCCGGCAC CGGTCACTCT 
CGGGCGGTGT CTCAAGGACC TGCCTTGCTG 
CCACCCCATC CGGTCGACGA GATCCTTCCC 
CACGTGGCGA GCATGTACGC CGGCCTGACC 
GGGCTGACCC CGGCCCAGCT CTCCGCCCTC 
GGCACGATCG CCCAGACCCT CGGCGTCTAC 
GGCGTCTCGT TCGCCGTCGT GTCACCGGCG 
GGCGCCCTCC CGGCGATCTT CGGGGCCACC 
GCTCCCGTCT TCTGCCGACT GGTCAGGTTC 
ACCCTGGTCG GCATCTCCCT CCTGCCGGTC 
GAAGCCGCCG GCTTCGGCTC CCCCGCCGAC 
ACCCTGACCG TGCACCGCAT GCTCTCGGGC 
GGCATGCTCG CGGGCACCCT GATCGCGATC 
GCCCAGGCGC CCCTCTTCGC CCTGCCCACG 
CCCACCGTGA TCGCCACCGC CGCGGTCGTG 
GCGCTGCTGG CGCTGGGCGC GGTCGCCGAA 
AGCCTCCGCG CCCTCGGCCT CGCCACGGTC 
ACGTCGTACG CGCAGAACGT CGGCCTGGTC 
GTCACGCTCT GCGGCGCCGT CCTCGTCCTG 
GTCGCCCTCG TCCCTCTGCC CGTCCTCGGC 
GCCGTCACGG GCATCCGTAC GCTGGCCAAG 
ATCGTCTCCG TCACCCTCGC CTTCGGTCTC 
CGTCTTCCCG CCCCGGTGGC GACCGTGCTC 
GCGGTCGTCC TCAACTACCT CCTGAACCAC 
GCGATCTCCG CGGAACAGGT CACCGCCCTC 
CGTTCCTCCG ACTGGACGCC CTTCCAGCCC 
GGCCGTCACA CCAGGGGCAC GGCACGGCCC 
GTGGACCCCA CCGACACCGG GCGGCACCAC 
CGACCAGACG AGGTGCCCCC GCCGCTCCAC 
CCCGCCGTCA CCGAGAACGC GGTCTTTCCC 
CCCCGGCCCA CCGGTCGTCC CGACCGTCCC 
TGGCAGCATC CGCAGACCCC CTCCGCATCC 
ACCTCTTCGC AGAGCCCGGC ATCGCACAGG 
ACCACCGCCG GAAGCGCCCC CGGGGACCCG 
GGTAGCCCCG ATGACCACCG TTTCCGCCGC 
CGGCACGTCC CGCCCGGGCC CCGACGAGAG 
ATCCGCCCGG ACGGTCCTCG ACCCGGATGC 
GTTCGGCGAC GGGAGGTTCG ACGCGGCGAT 
CGCGCTCTCG CGGCTCGGGG AACTGCGCCG 
GACCGACCCC TCGCGCGTCC GCTCGTTCTG 
CGTCGAAGCG CGGCGTCATC CGCCGATCGC 
CGAGGTGCGG AGCGTCCCCG TTCCCCTCGA 
CGGAAGACCC GAGAAGCTCC TGGACCCGTC 
CGTGGACGAC CGGGTCCGCG AGGAGTTCGA 
GGAGTGGGAC GAGCGCTTCG GCCACCTCCG 
GATCGTCCGT GCCGTCCCCT GACGTCCTCC 
GCGGTCTCCG GCCAGGTGCC CCGAGAGCTC 
GTAGAAGTGG CCGCCGGGCA GGACCCGTAC 
GGCGTCCATG TCCCCTACCG CCACGTTCGG 
ACAGCCGACG GCGGTGGGGA CGCGGGGGCC 
GACCGCGGGC AGCACGAGCT GTCGGATGTC 
GCCGAGCCCG CGCAGTACGG CGACCAGCTC 
GGTCAGCCGG TGCGGGGCCT TGCGGCTGGA 
CCGCTCCTCC AGGCGCAGCG CCACCTCGTA 
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79681 GGCCAGGGAC GCGCCCATGC TGTGGCCGAA GAGCGTCAGG GGCACGTCCG CGAGCGGCAG 
79741 CAGTGCCGCC GTGACCCGGT CGGCGAGCAC GTCCATCCGG TCGACGAACG GCTCGTTGAA 
79801 CCGCTCCTGG CGTCCGGGGT ACTGCGCCAC CAGGACCTCG GTGTCGCCGC CGAAGGCGCT 
7 9861 GCCCCAGGCG TGGAAGAAGC TCGCGGAACC GCCCGCGTGC GGCAGGACCG CCAGCCGCCG 
79921 CCGGGGCGCG GGAGTGCTGG AGTACCTGCG GAACCACGTC GTGCTGTCCG TGCCGGTCGT 
79981 CATGTGTGCG TACACCCCGT CCTCGGGTTC TTGGGGTGCC AGTGTCCCCG CAGGGCCCGG 
80041 TGTCCGGACG UGGTGGGGGT CCGGTGGCGA GCCGCTTACG TGTCCCGGCG CTTCCGGGAC 
80101 CGGCGGCCGC ACACGTGTCG GCCCCCACGA ACACCAGGGT GCGTGGGGGC CGATGCGTGT 
80161 TTCGAGTCCT GGTCTGACGA TTTCAGGCCG AAAGATATGT CGGACTTTAC AGCTGCGATC 
80221 GAAGCCGATC GATAATGCCG TGGACGGGTA ACGTCGGAAT CACTCGGTGC TCTTGAGCGC 
80281 ACCACTCACG TTGACGACCT CGTGGCACTC CCGCGCCTGC TGTCCCGTCG CGGGCGTACC 
80341 GGGCTTCGTG CAGGTCACGT CGATGGTCAC GGTGTCCTTC GCGCCGATCC TGATGGGAGT 
80401 CACCCAGTGG TAGTCCTGGT TGCGGAACGT CTCCAGCGCG ATCGTGGTGA TCTTGCGGTC 
80461 CCCGAAGGTG ATCGTCATCA CCCCTTCGTC ACCCTGGAAG TTCGCGACAA CGATGTCCGT 
80521 GATCCCGAAC ACGGTCTTCT CGGGGACCGT GTACGTCCCG GTCCTGGACT GTCCCGCTCC 
80581 CGACCTCAGG TCGATGGTGG CCGAGCTCTG CCGTCCCCCG CCGGTCGCGG TGCCGTCGCC 
80641 CGTGCCGGCC GAACCGTCGT CCCCGCCGCC CGCTCCACCG CCCGATCCCG GCTTCTGGCT 
80701 GCTGCCGCCG GACGGCCGGC CCGGGCCGGG GGAGGAGCCG TCCGAGCCGC TCTCCGTCGG 
80761 GACCGGGCGG GGCTGCACCG CCTGGGTGGC CGCCTCCTCG GCGGCGCTGC GCACGGCCGG 
80821 CCGGACCAGC GTGAACCAGG CGATCAGCAG CGCGATCAGC GCCGCGAGCA GCAGGAGCAG 
80881 CCACTTGGGG AAGACCGGTA TCTGGACGAA CTCCGCGTCC AGCGTCGGCG CCGTGTGGGG 
80941 CTCCTCGGCG CGGTCCTCGG TCTGCTCCGG CTCGCCGGTC TCGCGGGCGT CCACGGTGAA 
81001 CGGCCAGACC ACGGGGTCGC CGAACCACAC CGGGCTCGCG GTGCGGACCC GCAGCCGGAG 
81061 TTCCTTCGAC TCGCCCGGCT CCAGCGCCGG CTCCGCCGGC GTGAAGGCGA ACCGGAGCTC 
81121 CTCGCCCGCC TGCCCGGGCG TGAACCCCAC CCGTACCGGG GTGTTGCCCT GGTTGCGGAC 
81181 GGCCAGCAGA TAGCGGCCCC GGAGCCAGCC GCGCCGGCGG CGCGGCGAGA GGTCGGTCCG 
81241 CAGCTCGTGG AACGCGCCGA CGCGCACCAC GGTCTCCAGG ACCTTGACCG ACTCGGGCTG 
81301 CTCGTTCGGG AGGATCCGTA CACCGAGGGG CAGCTCGCCG GCCCGTGTCT CCGGCGAGCG 
81361 CGGCGGTGCC AGACGGAGCG TCACCGTCTC GGACGTGCCG GGATAGAGGG AGAGCCGCTC 
81421 GGGCTCGACG GTGGTCCATT CGGCACCGTC ACCGACGACC TTCAGGTCGT ACGCCTCGAC 
814 81 GATGTCACTG TCGTTGCGGA CGGTCAGGGT GGTGGTGGCG ATGTCGCCCG GCGTCACGGA 
81541 CACGGCCGGG ATGTCGAGGC CGGGCGCACC GGGACCGGAG GAGGCTGCGG AAGGCGTCAC 
81601 CCGCCCCACC GTAGGAGACC TGACAGATCC GTACGAGGCA CGCGAGGGCA ATGTCCGGGC 
81661 AGCTCGGCTG CCCGGCAAGC ACAAGTCAAC TCTCCGGTAA CAATGGATTT CTAGTCTGGA 
81721 GAGCCGCCTT CGGCACACCA CCGGCCCGTG GTCGGCTCGT GTCGTGTCCG CCTTCCCCCC 
81781 ACCGACCCAG GAAAACAGGT ATCCGATGTT CCGCACGGAG GAGAAGAGGC CGGTCGCGAC 
81841 CGGCACTACG GCGCATGACG CCGTCCGGGG CCACCCGGAC GCCCATGCCG CCGGCTTCGG 
81901 CCGCCCGCGC CGCGTCACCG TGGCGGTCTA CGCCGCCGAC CCCGTGCTGC GGGTCGGCGT 
81961 CGTCCAACAG CTCCGCCAGC GCCCCGAGAC CGAGCTCGTC GACGACGCGG ACGCGGAGAA 
82021 CGCGCAGGTC TCCCTGGTCG TCGTCGACGC CCTCGACGAC GACGTGACCG CCCTGCTGAC 
82081 CCGGCTGAGC TACAACGGCG CCACCCGCGC GGGACTCGTG ATCGGCACCC TCGGCGTCGG 
82141 GGCGCTCCAA CGCGTCGTCG AGTGCGGGGT GTCGGCGGTG CTGCGCCGCG CCGAGGCCGA 
82201 CCAGG AC CAG CTCGTCCAGC TGGTCCTGGC GGTGGCCAAC GGCGAGGGCG TGCTCCCGGG 

822 61 CGACCTGCTC GGCGAGTTAC TGGGACACGT CGGCAGCCTG CGCCGCGCGG CCCTCGACCC 
82321 CGGCGCCCTG CCCCTCTCCA CCCTCACCAG CAGGGAGGCG GAGATGCTGC GCCTGGTCTC 

823 81 GGAGGGCCTG GACACCGCGG CGATCGCCCG CAAGACCTCG TACTCCGAGC GGACCGTGAA 
82441 GAACGTCCTG CACGAGATCA CCACCCGCCT CCAACTGCGC AACCGCGCCC ACGCCGTGGG 
82501 CTACGCGCTC CGCAACGGGC TGATATGACC GTCCCGTCCG GACCGCGGCC CGGCGGCCGG 
82561 CGCGACAGCC GGAGGGAAGG CGGCGCTGCC CCAAAGTGCA TCCCGCCCTT CCCCCGGGTG 
82621 CGGCCCCCGG GCCCTCCCGC CGCGTGCGCC GCCGCCGCAC GATGACGGGG GGCACCTCCC 
82681 GGTGCCGCAC CGGACGGAGA AGGGCACCGT GATGAAGACC GCTGGCCCCG GTGGACGGCA 
82741 CCGCCGGGGG AGACTCGCCT CGGCGCTCCT GCTGCTCGTC CCCCTGCTGG GCGCGACGGG 
82801 CGTGGCCGGG CCGGACGACC CCCGGACCGC GGCGGCCGCG GCGGACGCCG CCGAGACCAC 
82 861 CCGCATCGCC TACGCGGGCA CCGGCCACCG CAGCCTCGGC GAACCGGCCT CCACCGACTC 
82921 CAGCACCCCG CTGTTCGGAG CGGGACCCAC CCACTACGAC ACCGACCCGT CCGCCCTCGG 
82 981 CGACCGGCTG GTCTTCGCGA GCCGCCGCGA CGAGAAGCAC CCCCAGATCT ATCTGCGGGG 
83041 CGCCGACGGC GGAGTCCTGC GGCTCACCAG CGGCCTGGAC GCGGCCCGTC CCCGGCTCAC 
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83101 CCCGGACGGC GGGTCGGTGC TCTTCGACGC 
83161 CCTGTGGCTG GTGCGCACCG ACGGCACCGG 
83221 CGAGGAGGAC CCGGCGGTCT CCCCCGACGG 
83281 CCCCCTGGCC GGGCGGCAGA TCTACGTCCG 
83341 CACCGACCCG GCCCGCGGCA CGGCCTCCGA 
834 01 CAACCGCGCG TGGATCGCGT ACACGTCGAC 
83461 GCTGCGGATC ACCGACGGCA CCACCGACGA 
83521 GCAGGGCCAC GGGGCGGCAT GGCTGCCCGA 
83581 GACCACCTGC ACCTGCAGGA CCCCCTACGA 
83641 CCGGGAACCC TCCCTGGTGC TCGACGAGGA 
83701 CACCGCCGAG GGCGGCCACG CGATCGTCGA 
83761 GACCCTCCAG GACATCCGCG CGGACGGTTC 
83821 GCGCGAGGAC CCCCAGGCCG ACACCAACAC 
838 81 CGCGCCCCCG TTCGACCCGT GGACCGAACG 
83941 CGTCCTCACC CGCTTCGAGG GCCCCGACGA 
84001 CGCCGACGGT ACGAACGAGG CGCCGATGCC 
84061 CACCGACCCG ACGTTCTCCC CGGACGGCAC 
84121 CGGGGTCGGC GAGGCCGCGG GAGACAGCCG 
84181 GATCACCGGA GAGATCGTGC CCCCGGCCGG 
84241 CTGGTCCTCC GACGGCACCA CCCTGGCCTT 
84301 CGGCAGCAAG CACGTGTGGA CCGCGTCCAC 
84361 CGCGACGCAC TGCCCGCGCG ACTGCGACGT 
84421 CGGACGCTCC CTCGCCTTCA ACCGCAAGAA 
844 81 ACTGCTCCTG ACCACCCTGT CCGGCGACGC 
84541 CGGCCAGGAC GGCGCGTGCG AGCGGGAACT 
84601 GCCGCGCGAC GCCGCCTGGA CCGCCGACGG 
84661 GGCCGCGGTC AACAGCCCGG AGAAGCTGAA 
84721 CCCGCTCACC GCCGAGCTCG CCGGACGCCA 
84781 CCTCGCCGTC GAGGCACCCG CCACGACGCC 
84 841 CACCGTCCAC GTGGTCAACC ACGGTCCCGC 
84 901 CCCGCCGTCC GGTGTGCGGA TCACCGGGAT 
84 961 CTCCCTCCAG TGCGACCTGG GCGTCGTCGA 
85021 GCTCACCGGC GTCACCGCCG GCGACGCACC 
85081 CGACCCCCGG CCCGGCGACA ACGACGGCCG 
85141 GACGCCGACC CCCACGCCGA CGCCGACCCC 
85201 GACCCCCACC CGGACCCCGA CGCCCACCCC 
85261 GCCGAAGGCC GGACCCGGGG TGCGGATCAC 
85321 ACGCGTCGTC GTCACGTACA GCGTCCGCAA 
853 81 GCTCAGGATC GGACTGCCCG CCGGGGTGCC 
85441 GAACGGCGCG TGCGCGCTGC CCGACCTCAC 
85501 CCTCAGCCCG AAGAAGGCGA TGACCGCCCG 
85561 GGACGCCGAC CGCAGCGACA ACACCGCCCG 
85621 CGTCGCCGTG CCCGACATCG GCAAGCCCGG 
85681 CCCGCCCGGC GTCCCGGTGC GCTTCAGCTG 
85741 GACCTTCCCG GAGGCCGACG GCACGTTCAT 
85801 GACCGGGCCG CGCACCATCA CGGCCTCGGG 
85861 CCTGGTCGTC AGCGGCACCG TCCAGCCGCC 



CGCCGACCCG GCCGGCGGCT CCCAGCGCGA 
GCTGACCCGG CTGACGGACA CGCCCGCCAG 
CGCCCGGATC GCCTACTCCA GCGACGCCGA 
CGCCCTCACG GGCGGCATCC CCACCCGGCT 
GCCCGCCTGG AACCCCGTCG ACGACGACGT 
CACGACCGAG GACGGGCGGA CCAGGCAGCG 
GACCCTGTTC ACCGGCGCCT ACGCGAACTG 
CGGGGACGGG ATCGTGTTCC TCAGCCCCGA 
CCACGTCTTC CGGTCGGTCG TGCACGCCGA 
CCGCGACGTC CTCTCGCCCA CCTGGATCGG 
GCGCAGCTCG GCGGCGACCG CGCACACGGC 
CGACCCGCGC GACCTGCAGC GGAAGATCCT 
CGACCCCGCC AAGGATCCGC TCTTCCAGCC 
GCAGAACTAC ACCCCCGACG GGCGCCGCCT 
CGCGCGGATC GAGCGGATCT GGACGGCCGA 
CCTCGACGGG CGCGGCGCGC GGGACTGGGA 
CCGCCTGGCC TTCACCCGCA CCTCGCCCGG 
CATCCTCCTC GCCGAGGTCG CCACCGGCCG 
TGAACTCCGC GGCGGGGACG CCCAGCCGAC 
CACCCGCGCC CGGCAGATCG CCGGGGGCGG 
GGCTGACCTG ACCCGGCAGC GCGACCTGAG 
CATCGACGAC AGCCCCGCCT TCTCGCCCGA 
CGGCGGCGGG CGGATCGACG AGCGCAACGG 
CTGCCAGGTC CTGCTGCCCA CCGCCGCCCG 
GCCGGACACC ACGCTCACCG GTCCGCACCA 
CAAGAGGCTG GTCTTCAGCT CCCGGGCCGC 
CGTCCTGGAC GTCGGCTCCG GTGACATCAC 
GAAGGAACCC ACCGTCCAGC AGTCCGTGGA 
CGACGTCACC GTCGGCGCGT CCGGCACGGT 
CGCCTCGCCC GGCACCCGGC TCACCGTCGT 
CGAGTGGCCC GGCGGCACCT GCGACGCCGC 
GGCCGGAGCC CAGGTCCCCG TGGACGTCAC 
CGTCGACTGG TCGGTCACCG GCGCCGTCCT 
GAGCGTGATC CCCGTACGCG AGGCACCCCC 
CACGCCCACC CCGACCCCGA CTCCGACGCC 
GACTCCGACC CGGCCCCCGC AGCCCCCCGC 
CGTCCAGCCC GAGCCCGGCT ACGTCGGCGG 
CGGCCGCAAC GCGCTCGCCA CCGGACTCCG 
CCACGGCGGA CTTCCGGCGG GCTGCGACCG 
CCCGGGCACG ACCGCCGTCC TGCGGGTCGT 
CGTCACGGCC GTGCTCGACA CCACCGGCAC 
GGAGCGGCTG CGCGTCCTCC AGCCGCGCAT 
ATTCGTCACC TCCGTCCGAG GCGTGGACTT 
GAACCCCGGG ATCACCGCCG CCGCCTCGCC 
CGGACAGCTC CTCATCCTCG CCAAGGACCA 
CCCCGGATTC TCCCCGGTGA AGACCGACTT 
GGACGGGGTG ACTCGCCGGT GATCC 
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Example 2 Construction of a "Clean" Host Strain, S. fradiae K159-1 

[01 02] This example describes the preparation of the clean host, Streptomyces fradiae Kl 59- 
1, a strain in which the tylGI, tylGII, tylGIII, tylGIV, and tylGV genes have been deleted. Plasmid 
pKOSI59-5 was first constructed as follows. Two fragments flanking the tylG genes were PCR 
amplified from S. fradiae genomic DNA using the following primers: 

tylGI left flank: 

forward 5 ' -TTTGCATGCG ATGTTGACGATCTCCTCGTC [SEQ ID NO:_]; 
reverse 5 '-GG AAGCTT CATATGTTCTCTCCGGAATGTG [SEQ ID NO:_]; 

tylGVI right flank: 

forward 5 ' -TT AAGCTTTCT AG AG AGG AG AGGCCGTG AAC [SEQ ID NO:_]; 
reverse 5'- AAA GAATTC GAACTCGAGCACGGACTCGTTG [SEQ ID NO:_]. 

[0103] The Sph I, Hind III and EcoR I restriction sites are underlined. The two fragments 
were then cloned into pSET152 using the underlined restriction sites and corresponding sites in 
pSET152 [see Bierman, M., Logan, R., O'Brien, K., Seno, E.T., Nagaraja, R. & Schoner, B.E. 
(1992). Plasmid cloning vectors for the conjugal transfer of DNA from Escherichia coli to 
Streptomyces spp. Gene 1 16:43-49], The resulting plasmid was named pKOS 159-5. This 
plasmid no longer contains the int-$C3l gene and attP locus from pSET152 and therefore serves 
as a suicide vector for delivery by homologous recombination. 

[0104] Spores of S. fradiae 99 (Russia) were prepared by harvesting from strain grown on 1- 

2 AS-1 plates [see Wilson, V.T.W. and Cundliffe, E. (1998). Characterization and targeted 
disruption of a glycosyltransferase gene in the tylosin producer, Streptomyces fradiea. Gene 214: 
95-100], filtering the spores through sterile cotton, and resuspending in 1 ml of 20% glycerol 
[see Hopwood, D.A., et al. (1985). Genetic Manipulation of Streptomyces: A Laboratory 
Manual. The John Innes Foundation, Norwich, UK]. Spore suspensions were stored at -20 °C. 
A 20 \i\ aliquot of the spore suspension was added to 5 mL of 2xYT and incubated at 30 °C with 
shaking. After two days, the cultures were diluted 1:50 and incubated at 30 °C with shaking for 

3 h. After that 1 mL of the cultures were collected by centrifugation (recipient cells). Donor 
cells were prepared by transforming E. coli SI 7-1 with pKOS 159-5 and selecting for apramycin 
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resistance only. Several colonies were picked off the primary transformation plate and used to 
inoculate 5 ml of LB with chloramphenicol (10 ng/ml) kanamycin (100 \xg/m\) and apramycin 
(60 ^g/ml). The cells were grown at 37 °C for 4 h (OD 6 oo of 0.4-0.6), collected by 
centrifugation, washed in 5 raL LB, ccntrifuged, and resuspended in 100 p.l of LR- Conjugal 
transfer between the donor and recipient cells was performed by resuspending the recipient cells 
in the 100 pi donor suspension and spreading the cells on AS-1 plates. After incubated at 37 °C 
for 16-20 h the plates were then overlayed with 3 mL of soft nutrient agar containing 1 mg 
nalidixic acid and 1.5 mg apramycin. Exconjugants were observed after 48 h of further 
incubation at 30 °C. 

[0105] Apramycin resistant colonies were analyzed by PCR to confirm single crossovers at 
both flanking regions. One colony was selected for carrying out a double crossover as follows. 
The strain was grown on AS-1 plates non-selectively until well-sporulated. Spores were 
harvested, dilutions were plated on AS-1 plates, and single colonies were screened for loss of 
apramycin resistance. A single apramycin sensitive colony was isolated which did not produce 
tylosin. The double crossover was confirmed by PCR. This strain was named S.fradiae 
K159-1. 

[0106] Streptomyces fradiae Kl 59-1 was deposited under the terms of the Budapest Treaty 
with the American Type Culture Collection, 10801 University Blvd., Manassas, VA, 201 10- 
2209, on 12 March 2003, with accession number PTA-5060. 

Example 3 Construction of S.fradiae Kl 59- 1/244- 17a, A "Clean" Host Expressing 

Methoxvmalonvl Biosynthetic Enzymes 
[0107] Streptomyces fradiae Kl 59- 1/244- 17a is derived from strain K159-1 (Example 2) by 
addition of the JkbGHIJK genes from Streptomyces hygroscopicus var. ascomyceticus ATCC 
14891, which encode proteins catalyzing the biosynthesis of methoxymalonyl-ACP. 
[0108] The putative methoxymalonyl-ACP biosynthetic genes from S. hygroscopicus ATCC 
14891 (JkbGHIJK) are arranged with the 3' end of fkbG (encoding an O-methyl transferase) 
overlapping by 6 codons the 3' end of fkbH (encoding an unknown function), which is the last 
gene of a convergent operon that begins with JkbB (one of the PKS genes) and ends with the 
genes fkbK, J 9 / and H. To facilitate expression of these genes in S. fradiae, an operon was 
constructed beginning with fkbK and ending with fkbG, all in the same direction. This was done 
using PCR to clone fkbG with flanking restriction sites to allow its 5' end, with its existing 
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Shine-Dalgarno sequence, to be fused to the 3' end of fkbH. This operon was then placed behind 
the tylG promoter in a pS AM2-based vector, which was introduced in S. fradiae clean host 
K159-1 by conjugation. Exconjugants were selected and named Kl 59-1/244-1 7a. 
[0109] Streptomyces fradiae Kl 59-1/244-1 7a was deposited under the terms of the Budapest 
Treaty with the American Type Culture Collection, 10801 University Blvd., Manassas, VA, 
201 10-2209, on 12 March 2003, with accession number PTA-5053. 

Example 4 Construction of an operon containing all five chalcomycin PKS genes and 

expression in S. fradiae 
[0110] A construct comprising the genes encoding chmGI-Vv/as constructed as follows: 
The 3 ? end of ChmGV was obtained by PCR with pKOS146-185.10 as the template and the 

following primers (Chalco-IA: GACACGGCCGGTGAGAGCAGC [SEQ ID NO: ] and 

Chalco-IB: CT TCTAGAT GTCGCGGTGTACGG [SEQ ID NO:_J). The 942 bp PCR product 
was digested with Ncol and Xbal, the 309 bp fragment was gel isolated, and the fragment ligated 
into the same sites of Litmus29 to give pKOS342-33. that plasmid was cut with Ncol and Xhol 
and ligated to a 2.4 kb Ncol-Xhol fragment from pKOS146-185.10 to give pKOS342-35. That 
plasmid was digested with Bglll and Xhol and ligated with a 6.4 kb Bglll-Xhol fragment 
(including chmGIV and the 5' region of chmGV) from pKOS146-185.10 to create pKOS342-36 
(containing chmGIVmd GV). 

[01 1 1] A 5 .4 kb Hindlll/ PstI fragment containing the 5 ' half of chmGIII was isolated from 
pKOS 146- 185.1 and a 6.3 kb Pstl/Bglll fragment containing the 3' half of chmGIII was isolated 
from pKOS146-185.10. These two pieces were ligated into Litmus28 cut with Hindlll and Bglll 
to obtain pKOS342-38. Plasmid pKOS342-36 was cut with Bglll and Spel, the 9 kb fragment 
was gel isolated and the fragment ligated to the Bglll and Spel sites of pKOS342-38 to obtain 
pKOS342-39. 

[0112] Plasmid pKOS232-172 (described in Example 5), containing chmGIand Gil was cut 
with Ndel and Hindlll and the 19 kb fragment was isolated. Plasmid pKOS342-39 was digested 
with Hindlll and Spel and the 20 kb fragment was isolated. These two fragments were then 
ligated into the vector portion of an expression cosmid, pKOS244-20 (gel isolated 8 kb Ndel- 
Spel fragment). The resulting plasmid (pKOS342-45) was recovered using in vitro X phage 
packaging (Stratagene) and infection of E. coli DH5oc. The correct clone was identified by 
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restriction enzyme analysis and the plasmid was moved into E. coli DH5ct/pUB307 and 
conjugated into S.fradiae. 

[01 13] Expression of PKS genes in S. fradiae (in this and the following examples) were 
„~A*r tUa. ™«trr,i n f tu<» tvir»dn PKS nmmntpr /tvlGTn. see Rodrieuez et al.. "Raoid engineering 

Ul 1UVI VI AW VVlltlVt *-J *■ - | 1' / w — — 

of polyketide overproduction by gene transfer to industrially optimized strains" J Ind Microbiol 
Biotechnot). 

[0114] Apramycin resistant colonies were obtained, shown to secrete bioactive compounds, 
and grown vegetatively in 5 mL TSB medium (+ 30 ug/ml apramycin) at 30°C for 48 h. Two 
mL of seed culture was inoculated into 40 ml Russia (R) medium (15 g/L whole wheat flour, 10 
g/L corn gluten hydrolyzate (Sigma), 25 g/L beet molasses, 2.5 g/L brewer's yeast, 1 g/L 
(NH4)2HP04, 1 g/L NaCl, 2 g/L CaC03, and 34 g/L soybean oil) in 250 ml baffled shake 
flasks. After a 7 days growth at 30°C, the culture broth was analyzed for 16-membered 
macrolide production by HPLC (Metachem Metasil Basic column, 4.6x150 mm, 5 um particle) 
using linear gradient from 15 to 100% organic phase (56% methanol, 5mM ammonium acetate) 
at 1 ml/min over 7 min. The HPLC used simultaneous detection by electrospray mass 
spectrometry (Turbo Ionspray source) and UV absorption at 282 nm. LC-MS analysis of the 
broth showed that several chalconolide derivatives were produced. The most abundant 
compounds were purified and shown to have the structures below. The 3-keto also forms the 
enol tautomer. 




524 amu 668 amu 



Example 5 Construction of Streptomvces fradiae K232-192 Expressing a Hybrid 

Chalcomvcin-Spiramvcin PKS 
[0115] Streptomyces fradiae K232-192 is derived from strain K159-l/244-17a (Example 3) 
by addition of hybrid chalcomycin-spiramycin PKS genes, which encode proteins catalyzing the 
biosynthesis of 14-methylplatenolide. The chalcomycin genes were obtained from cosmid 
pKOS 146- 185.1, which was deposited under the terms of the Budapest Treaty with the American 
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Type Culture Collection, 10801 University Blvd., Manassas, VA, 201 10-2209, on 19 February 
2003, with accession number PTA-4961 . 

[01 16] The first two genes of the chalcomycin PKS were isolated from the cosmid 
pKOS 146-1 85.1 as EcoRI/XhoT and XhoI/BspHI fragments, and a coding sequence for a 
spiramycin PKS C-terminal linker attached to 3' end. The EcoRI site is near the 5' end of 
chmGl. The EcoRI/XhoI fragment was cloned into a modified Litmus28 with a synthetic linker 
inserted in order to create an appropriate translation start sequence. The altered region of the 
Litmus28 polylinker between the Aflll and EcoRI sites in this plasmid (pKOS232-165) is given 
below. The plasmid with the chmG fragment was pKOS232-168A. 

Aflll Pad SD Ndel EcoRI 

CTTAAGGGTTAATTAAGGAGGACACATATGTCCGGAGAATTC [SEQ ID NO. ] 

M S G E F 

[01 1 7] The XhoI/BspHI fragment was ligated between the Xhol and Ncol sites of Litmus28 
to give pKOS232-156. The two fragments were then joined to give pKOS232-172. 
[0118] Starting with overlapping cosmid pKC1306 (described in US Pat No. 5945320 to Eli 
Lilly Company), a cassette containing the last three ORFs of the spiramycin PKS was 
constructed as follows. An Avrll site was introduced at the 3' end of srmG5 by PCR from a 
natural Mlul site to the 3' end. The PCR product was cut with Mlul and Avrll, gel isolated and 
ligated into a Litmus-based vector (pKOS232-75B) between the same sites to give pKOS231- 
1 18A. The 7 kb BamHI/MluI fragment from cosmid pKC1306 was subcloned in Litmus38 
(New England Biolabs) to give pKOS231-l 13A. The 3.8 kb BamHI/MluI fragment of 
pKOS231-l 18A was gel isolated and ligated with the 7 kb BamHI/MluI fragment of pKOS231- 
1 13A to give pKOS231-120. The 7 kb BsrGI/BamHI fragment from pKC1306 was subcloned in 
Litmus38 to give pKOS231-l 13B. The 6.2 kb Pstl/BamHI fragment from pKOS231-l 13B was 
cloned into Litmus28 to give pKOS231-122. The 7.5 kb BamHI/Avrll fragment was isolated 
from pKOS231-120 and ligated with pKOS231-122, which was cut with BamHI and Avrll and 
dephosphorylated, to give pKOS231-124. The 3.1 kb BamHI/Spel fragment from pKOS231- 
1 18B (which contained a PCR fragment that created a 5' end for srmG3) and the 7.5 kb 
BamHI/Avrll fragment from pKOS231-120 were isolated and ligated to give pKOS231-130. 
The 14 kb BamHI fragment was isolated from pKC1306 and subcloned in Litmus28 to give 
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pKOS231-l 1 IB. The 14 kb BamHI fragment was isolated from pKOS231-l 1 IB and ligated to 
pKOS231-130 cut with BamHI and dephosphorylated, to give pKOS231-132. 
[01191 To attach a coding sequence for a spiramycin PKS C-terminal linker to the 3' end of 
chmGIl a Kindlll site was introduced at the 3' end oi$rmG2 using PCR with pKOS231-l 12B as 
template. The engineered Hindlll site was positioned with respect to the reading frame to match 
that of the natural Hindlll site in the chalcomycin chmGII gene. The resulting PCR product was 
cut with Hindlll and BamHI (a natural site) and ligated into the same sites of pKOS231-l 14A to 
give pKOS232-178. This was then joined to pKOS231-132 at the BsrGI site to give pKOS232- 
182. The chmGl,2 cassette was isolated from pKOS232-172 as an 18 kb Ndel/Hindlll fragment 
and the srmG3,4,5 cassette was isolated from pKOS232-182 as a 20 kb Hindlll/Avrll fragment. 
These fragments plus a pSET152-based vector having the tylG promoter (the vector portion gel 
isolated from pKOS244-20) were joined in a three-piece ligation and recombinants were 
recovered by in vitro lambda phage packaging and infection of E. coli. Correct constructs were 
identified by restriction analysis (pKOS232-184A) and transferred into E. coli DH5a/pUB307. 
[0120] The resulting E. coli DH5a/pUB307 cells were conjugated with S. fradiae Kl 59- 
l/244-17a (Example 3) to produce Streptomyces fradiae K232-192. Streptomyces fradiae K232- 
192 was deposited under the terms of the Budapest Treaty with the American Type Culture 
Collection, 10801 University Blvd., Manassas, VA, 201 10-2209, on 12 March 2003, with 
accession number PTA-5052. Conjugation was performed as described in Practical 
Streptomyces Genetics (Kieser et al,, 2000) except that plates were left overnight at 37°C before 
overlaying with the selective agent (apramycin and naladixic acid). Apramycin resistant 
exconjugants were streaked for single colonies and a set of clones were patched onto R5 plates 
and inoculated into tryptic soy broth (40 ml in 250 ml shake flasks). Both the solid and liquid 
media contained apramycin (to select for pKOS232-184A) and kanamycin (to select for 
pKOS244-17A). Liquid and solid cultures were grown at 30°C. Agar plugs taken from most 
patches on R5 showed bioactivity when placed on an M. luteus test lawn. The agar was 
extracted with ethyl acetate and found to contain a compound of 730 amu. TSB seed cultures at 
2-3 days were used to inoculate fermentation media and these cultures were grown for 7-10 days 
at 28°C. Upon extraction with ethyl acetate the 730 amu compound (730-1) was isolated and its 
structure verified by NMR as shown below. In addition, LC-MS analysis of the filtered culture 
broth showed abundant production of a 586 amu and a 730 amu (730-11) compound, and a 714 
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amu compound and a smaller amount of a 904 amu compound (most likely representing 14- 
methyl-platenolide with all three sugars attached). Thus, the chalcomycin-spiramycin hybrid 
PKS synthesized the predicted 14-methyl platenolide. 




714 amu 
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904 amu 

Example 6 Construction of Streptomyces fradiae K344-51 Expressing a Hybrid 

Chalcomycin-Spiramvcin PKS and the chmH Gene 
[0121] The chmH gene was cloned from cosmid pKOS146-185.1, which was deposited 
under the terms of the Budapest Treaty with the American Type Culture Collection, 10801 
University Blvd., Manassas, VA, 201 10-2209, on 19 February 2003, with accession number 
PTA-4961. 

[0122] A 6.3 Kb EcoRI fragment containing the chmH gene and a small downstream 
ferredoxin gene was cloned from cosmid pKOS146-185.1 into Litmus28 to give pKOS344-10B. 
This was then cut with Sad and religated to give pKOS344-01 6 having a 2.1 Kb insert. An Ndel 
site was introduced at the start of translation and an internal Ndel site was simultaneously 
replaced with a PstI site (without changing the amino acid sequence) in a three-piece ligation 
using two PCR products ligated between the EcoRI and BamHI sites of pKOS344-016 to give 
pKOS344-022B. The unique Fsel site in pKOS344-022B was changed to an Xbal site with a 
synthetic linker and the chmH gene plus ferredoxin gene were excised with Ndel and Xbal and 
ligated into the expression vector, pKOS342-108D, between the Ndel and Avrll sites to give 
pKOS344-037B. This vector was transferred into DH5a/pUB307, and conjugated into K232- 
192 (Example 5). Exconjugants were selected with thiostrepton and streaked for single colonies 
to yield S. fradiae K344-5 1 . 

[0123] The vector for integration of chmH, pKOS342-108D, uses the int and att functions of 
Streptomyces phage <j)BTl . All S. fradiae strains were plated on AS1 agar for sporulation, R5 
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agar for solid media production, or grown in liquid TSB for vegetative growth and Russia 
medium for production. All appropriate antibiotics for selection of integrated markers were 
added to the media, except for the production stage. 

[0124] Expression of the chmH gene (with its downstream ferredoxin) dramatically 
increased hydroxylation of the 14-methyl. A 906 amu compound was detected. This is the 
structure expected when all post-PKS tailoring of the tylosin pathway occurred, along with an 
additional reduction adding two hydrogen atoms. The methanol adduct characteristic of the 
aldehyde is not seen for the 906 amu compound, and reduction of the aldehyde most likely 
accounts for the addition of the two hydrogen atoms. 




906 amu 



[0125] In addition, there appear to be a significant amount of 730 amu (730-1) compound 
that has the aldehyde and is hydroxylated on the 14-methyl. This is deduced by the presence of 
the methanol adduct (762 amu) and the fact that the 730 amu compound now elutes later from 
the CI 8 column compared with the 730 amu carboxylic acid seen prior to expression of chmH. 
[0126] Without intending to be bound by a specific mechanism, Figure 2 shows proposed 
pathways for post-PKS modification of the chalcomycin-spiramycin hybrid PKS macrolide 
product in the absence or presence of ChmH. When ChmH is present, the post-PKS reaction 
sequence from the Chm/Srm hybrid essentially follows that for tylosin and gives the 904 amu 
structure, which is converted by reduction of the aldehyde to a 906 amu compound. This 
reduction of the aldehyde has been described for tylosin (to give relomycin). Knockout of genes 
for allose biosynthesis or its transfer (tylJ), would give the demycinosyl compound of 730 amu 
(with the 14-hydroxymethyl and the aldehyde). 
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Example 7 Expression of a chmGI-GII operon with the tylG2 C-terminal linker in S. fradiae 
K105-2 

a) Construction of the S. fradiae tylD knockout strain K168-173 

[0127] The tylD knockout plasmid was constructed from two PCR products encompassing 
1 .8 kb regions upstream and downstream of the tylD gene using PCR primers that introduced 
new restriction sites. The upstream PCR product was cut with EcoRI and PstI and the 
downstream product was cut with PstI and SphL These were then ligated together between the 
EcoRI and SphI sites of pUC19 and the sequence was verified. The resulting plasmid, 
pKOS168-106, has about 80% of the tylD gene deleted between the artificial PstI sites. This 
plasmid was introduced into S. fradiae by conjugation from E. coli DH5a/pUB307 and 
apramycin resistant exconjugants were obtained. Three were found by PCR to be the result of 
homologous recombination at the expected tylD locus and these were grown in the absence of 
selection and screened for the second crossover. Apramycin sensitive clones were isolated and 
some were found that produced demycinosyltylosin (DMT) by LC-MS analysis of the 
fermentation broths. The strain was designated S. fradiae K168-173. 

b) Construction of S. fradiae tylD K105-2 

[0128] The S. fradiae DMT (demycinosyltylosin) producer (Kl 68-1 73) described above was 
used to introduce a KS-1 null mutation in the tylosin PKS. The plasmids pKOS168-190 and 
pKOS268-145 were digested with EcoRI and EcoRV and the 6.2kb and 2.6kb fragments, 
respectively, were gel isolated and ligated together to give pKOS264-65. A mutation was 
introduced into pKOS264-65 using PCR to change the active site cysteine of the tylosin KS1 to 
alanine, with the simultaneous introduction of an Nhel site, to give pKOS325-8. Finally, 
pKOS325-8 and pKOS241-52 were digested with PvuII and Xbal and ligated together to give 
pKOS264-76. Plasmid pKOS264-76 was conjugated into the DMT producer strain S. fradiae 
K168-173 (Example 5) from E. coli DH5a/pUB307 and exconjugants were selected for 
apramycin resistance. Clones that underwent the correct first crossover event were identified by 
Southern blot analysis and one of these was propagated without selection to allow a second 
crossover. DNA from clones that had become apramycin sensitive was digested with Xmal/Nhel 
and analyzed by Southern blot. Three clones had the pattern consistent with that expected for the 
desired second crossover to leave the KSl-null mutation in the chromosome. This strain was 
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designated S.fradiae K105-2, and was shown to produce no tylosin-related structure, but could 
convert O-mycaminosyl-tylonolide (OMT) into demycinosyl-tylosin (DMT). 

c) Construction of a chmGI-GII operon with the tvlG2 C-terminal linker 
[0129] The tylG2 C-terminal linker region was isolated by PCR from a pSET-based vector 
encoding the including the entire tylosin PKS (pKOS168-190). The primers used were TylLink- 
A: 5 ' -TG AAGCTTCCCGCC ACGCTGGTG-3 ' [SEQ ID NO:_] and TylLink-B: 5'- 
CGTCTAGACAGGGTGTGAAACCG-3 ') [SEQ ID NO:_J. This created a Hindlll site at the 
same position of the encoded sequence corresponding to the natural Hindlll site in the linker 
region oichmGII. The amplimer was cut with Hindlll and Xbal and ligated into Litmus29 to 
give pKOS342-78. The tylosin linker region of pKOS342-78 was excised using Hindlll and 
Xbal, and then ligated into Hindlll and Xbal-digested pKOS232-172 (Example 5) to create 
pKOS342-82. This hybrid piece was then cut out with Ndel and Xbal and ligated to a pSET- 
based vector (pKOS232-189) cut with Ndel and Spel to create the pSET152-based expression 
vector, pKOS342-84. 

d) Expression of a chmGI-GII operon with the tvlG2 C-terminal linker in 5. 
fradiae K105-2 

[0130] The expression vector pKOS342-84 was transferred to E. coli DH5cc/pUB307 and 
conjugated into S. fradiae K105-2 (this example, above) . Apramycin resistant colonies were 
isolated and fermented in production medium. The broth was analyzed by LC-MS and found to 
contain the compound shown below. The chm/tyl hybrids differ from the chm/srm hybrids only 
by having a 4-methyl in place of a 4-methoxy, apparently making the chm/tyl good substrates for 
the TylH hydroxylase. 




714 amu 
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*** 

[0131] All publications and patent documents cited herein are incorporated herein by 
reference as if each such publication or document was specifically and individually indicated to 
be incorporated herein by reference. 

[0132] Although the present invention has been described in detail with reference to specific 
embodiments, those of skill in the art will recognize that modifications and improvements are 
within the scope and spirit of the invention, as set forth in the claims which follow. Citation of 
publications and patent documents is not intended as an admission that any such document is 
pertinent prior art, nor does it constitute any admission as to the contents or date of the same. 
The invention having now been described by way of written description, those of skill in the art 
will recognize that the invention can be practiced in a variety of embodiments and that the 
foregoing description are for purposes of illustration and not limitation. 
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